Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

methodology-3

Q-Learning

3260 papers • 126 benchmarks • 313 datasets

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. ( Image credit: Playing Atari with Deep Reinforcement Learning )

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in q-learning-3

Trend

Dataset

Best Model

Actions

No benchmarks available.

Libraries

i

Use these libraries to find q-learning-3 models and implementations

opendilab/DI-engine

6 papers 2,500

Datasets

VizDoom

Yeast

Subtasks

No subtasks available.

Most implemented papers

Continuous control with deep reinforcement learning

A. Pritzel, N. Heess, Daan Wierstra, David Silver, T. Lillicrap, Jonathan J. Hunt, Tom Erez, Yuval Tassa•Tue Sep 08 2015

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

14495

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

zzmtsvv/rl_task

6 papers 35

hill-a/stable-baselines

5 papers 4,038

5 papers 392

DLR-RM/stable-baselines3

4 papers 7,862

facebookresearch/ReAgent

4 papers 3,521

chainer/chainerrl

4 papers 1,152

tensorlayer/RLzoo

4 papers 613

yihaosun1124/OfflineRL-Kit

4 papers 221

NervanaSystems/coach

3 papers 2,306

takuseno/d3rlpy

3 papers 1,196

3 papers 382

3 papers 16

facebookresearch/Horizon

2 papers 3,521

quantumiracle/Popular-RL-Algorithms

2 papers 976

Rafael1s/Deep-Reinforcement-Learnin…

2 papers 597

google-research/batch_rl

2 papers 505

Kaixhin/imitation-learning

2 papers 385

andrejorsula/drl_grasping

2 papers 331

2 papers 183

chandar-lab/RLHive

2 papers 99

2 papers 50

dfki-ric-underactuated-lab/torque_l…

2 papers 48

quantumiracle/mars

2 papers 37

2 papers 35

doerlbh/mentalRL

2 papers 24

matthewsparr/Deep-Zork

2 papers 18

borea17/efficient_rl

2 papers 13

2 papers 13

0

Playing Atari with Deep Reinforcement Learning

K. Kavukcuoglu, Daan Wierstra, David Silver, Volodymyr Mnih, Alex Graves, Ioannis Antonoglou, Martin A. Riedmiller•Wed Dec 18 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

13135 0

Deep Reinforcement Learning with Double Q-Learning

David Silver, H. V. Hasselt, A. Guez•Mon Sep 21 2015

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

8389 0

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine•Wed Jan 03 2018

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

9825 0

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto, D. Meger, H. V. Hoof•Sun Feb 25 2018

This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.

6063 0

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

P. Abbeel, Aviv Tamar, J. Harb, Yi Wu, Ryan Lowe, Igor Mordatch•Tue Jun 06 2017

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

5398 0

A disembodied developmental robotic agent called Samu Bátfai

Norbert Bátfai•Sun Nov 08 2015

The purpose is to create a rapid prototype of Q-learning with neural network approximators for Samu, an experiment that shows a significant improvement in Samu's learning when using LZW tree to narrow the number of possible Q-actions.

3 0

Conservative Q-Learning for Offline Reinforcement Learning

Aurick Zhou, S. Levine, Aviral Kumar, G. Tucker•Sun Jun 07 2020

Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.

2263 0

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

I. Sutskever, Tim Salimans, Xi Chen, Jonathan Ho•Thu Mar 09 2017

This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique.

1677 0

Offline Reinforcement Learning with Implicit Q-Learning

S. Levine, Ilya Kostrikov, Ashvin Nair•Mon Oct 11 2021

This work proposes an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization.

1247 0

Adding a benchmark result helps the community track progress.

Q-Learning | State-of-the-Art