3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in value-prediction
Use these libraries to find value-prediction models and implementations
No datasets available.
It is found that it is possible to substantially outperform existing offline RL methods on various tasks by simply increasing the number of Q-networks along with the clipped Q-learning and proposing an ensemble-diversified actor-critic algorithm that reduces the number of required ensemble networks down to a tenth compared to the naive ensemble.
This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.
This paper proposes an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning, and forms ACE in the option framework by extending the option-critic architecture with deterministic intra-option policies, revealing a relationship between ensemble and options.
This work shows that by making the Transformer architecture aware of the syntactic structure of code, it increases the margin by which a Transformer-based system outperforms previous systems, and advances the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems.
The timeXplain framework is employed in a large-scale experimental comparison of several state-of-the-art time series classifiers and similarities between seemingly distinct classification concepts such as residual neural networks and elastic ensembles are discovered.
DATE, a model of Dual-task Attentive Tree-aware tree-aware Embedding, is proposed, to classify and rank illegal trade flows that contribute the most to the overall customs revenue when caught.
PIVEN is presented, a deep neural network for producing both a PI and a prediction of specific values, and shows that its approach produces tighter uncertainty bounds than the current state-of-the-art approach for producing PIs, while managing to maintain comparable performance to the state of the art approach for specific value-prediction.
This work shows that random deep action-conditional predictions when used as auxiliary tasks yield state representations that produce control performance competitive with state-of-the-art hand-crafted auxiliary tasks like value prediction, pixel control, and CURL in both Atari and DeepMind Lab tasks.
This work presents "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location.
Adding a benchmark result helps the community track progress.