Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision-11

Reinforcement Learning (RL)

3260 papers • 126 benchmarks • 313 datasets

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in reinforcement-learning-rl-23

Trend

Dataset

Best Model

Actions

ProcGen

ProcGen

Libraries

i

Use these libraries to find reinforcement-learning-rl-23 models and implementations

opendilab/DI-engine

28 papers 2,393

Datasets

ProcGen

ManiSkill2

SMACv2

PRM800K

V-D4RL

QDax

Subtasks

Off-policy evaluation Multi-Objective Reinforcement Learning 3D Point Cloud Reinforcement Learning

Most implemented papers

Continuous control with deep reinforcement learning

A. Pritzel, N. Heess, Daan Wierstra, David Silver, T. Lillicrap, Jonathan J. Hunt, Tom Erez, Yuval Tassa•Tue Sep 08 2015

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

14495

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

zzmtsvv/rl_task

15 papers 25

chainer/chainerrl

14 papers 1,146

hill-a/stable-baselines

13 papers 4,001

facebookresearch/ReAgent

12 papers 3,508

facebookresearch/minihack

11 papers 430

DLR-RM/stable-baselines3

10 papers 7,531

facebookresearch/Horizon

10 papers 3,508

NervanaSystems/coach

10 papers 2,297

tensorlayer/RLzoo

9 papers 609

9 papers 370

yihaosun1124/OfflineRL-Kit

8 papers 202

deepmind/open_spiel

7 papers 3,947

Kaixhin/imitation-learning

7 papers 371

7 papers 359

chandar-lab/RLHive

7 papers 99

google/dopamine

6 papers 10,326

eleurent/highway-env

6 papers 2,270

takuseno/d3rlpy

6 papers 1,170

facebookresearch/benchmarl

6 papers 133

michaelnny/deep_rl_zoo

6 papers 87

deepmind/deepmind-research

5 papers 12,568

lucasalegre/morl-baselines

5 papers 202

facebookresearch/dcd

5 papers 109

atavakol/action-hypergraph-networks

5 papers 20

labmlai/annotated_deep_learning_pap…

4 papers 44,118

deepmind/dm_control

4 papers 3,457

huawei-noah/hebo

4 papers 2,576

deepmind/streetlearn

4 papers 279

4 papers 174

massquantity/DBRL

4 papers 139

zlr20/saferl_kit

4 papers 51

4 papers 25

markub3327/rl-toolkit

4 papers 17

FinRL-Meta

FinRL-Meta

Avalon

MIDGARD

POPGym

0

Playing Atari with Deep Reinforcement Learning

K. Kavukcuoglu, Daan Wierstra, David Silver, Volodymyr Mnih, Alex Graves, Ioannis Antonoglou, Martin A. Riedmiller•Wed Dec 18 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

13135 0

Deep Reinforcement Learning with Double Q-Learning

David Silver, H. V. Hasselt, A. Guez•Mon Sep 21 2015

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

8389 0

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

P. Abbeel, Aviv Tamar, J. Harb, Yi Wu, Ryan Lowe, Igor Mordatch•Tue Jun 06 2017

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

5398 0

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Chelsea Finn, P. Abbeel, S. Levine•Wed Mar 08 2017

An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.

13278 0

Prioritized Experience Replay

David Silver, Ioannis Antonoglou, T. Schaul, John Quan•Tue Nov 17 2015

A framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, in Deep Q-Networks, a reinforcement learning algorithm that achieved human-level performance across many Atari games.

4113 0

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, Aurick Zhou, P. Abbeel, S. Levine•Wed Jan 03 2018

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

9825 0

Asynchronous Methods for Deep Reinforcement Learning

K. Kavukcuoglu, David Silver, Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, T. Lillicrap, Tim Harley•Wed Feb 03 2016

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

9534 0

Dueling Network Architectures for Deep Reinforcement Learning

Nando de Freitas, H. V. Hasselt, Ziyun Wang, T. Schaul, Matteo Hessel, Marc Lanctot•Thu Nov 19 2015

This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.

4115 0

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto, D. Meger, H. V. Hoof•Sun Feb 25 2018

This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.

6063 0

Adding a benchmark result helps the community track progress.

Reinforcement Learning (RL) | State-of-the-Art