3260 papers • 126 benchmarks • 313 datasets
The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. ( Image credit: Playing Atari with Deep Reinforcement Learning )
(Image credit: Papersgraph)
These leaderboards are used to track progress in q-learning-3
No benchmarks available.
Use these libraries to find q-learning-3 models and implementations
No subtasks available.
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
The purpose is to create a rapid prototype of Q-learning with neural network approximators for Samu, an experiment that shows a significant improvement in Samu's learning when using LZW tree to narrow the number of possible Q-actions.
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.
This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.
This work proposes an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization.
Adding a benchmark result helps the community track progress.