Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

methodology-4

Multi-agent Reinforcement Learning

3260 papers • 126 benchmarks • 313 datasets

The target of Multi-agent Reinforcement Learning is to solve complex problems by integrating multiple agents that focus on different sub-tasks. In general, there are two types of multi-agent systems: independent and cooperative systems. Source: Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-Ray Reports

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in multi-agent-reinforcement-learning-8

Trend

Dataset

Best Model

Actions

ParticleEnvs Cooperative Communication

ParticleEnvs Cooperative Communication

SMAC-Exp

SMAC-Exp

Libraries

i

Use these libraries to find multi-agent-reinforcement-learning-8 models and implementations

facebookresearch/benchmarl

6 papers 234

Datasets

CityFlow

StarCraft II Learning Environment

StarCraft II Learning Environment

SMAC-Exp

Hanabi Learning Environment

Hanabi Learning Environment

OG-MARL

CivRealm

Subtasks

Most implemented papers

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

P. Abbeel, Aviv Tamar, J. Harb, Yi Wu, Ryan Lowe, Igor Mordatch•Tue Jun 06 2017

An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.

5398

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

UAV Logistics

UAV Logistics

opendilab/DI-engine

4 papers 2,966

cyanrain7/trust-region-policy-optim…

4 papers 180

cyanrain7/trpo-in-marl

4 papers 180

chauncygu/multi-agent-constrained-p…

4 papers 139

hhhusiyi-monash/UPDeT

4 papers 128

morning9393/HAPPO-HATRPO

4 papers 38

4 papers 20

anonymous-iclr22/trust-region-in-mu…

4 papers 11

3 papers 1,820

Denys88/rl_games

3 papers 851

uoe-agents/epymarl

3 papers 465

puyuan1996/MARL

3 papers 36

facebookresearch/Hanabi_SAD

2 papers 95

facebookresearch/hanabi

2 papers 95

uoe-agents/robotic-warehouse

2 papers 56

uoe-agents/lb-foraging

2 papers 38

longtermrisk/marltoolbox

2 papers 29

2 papers 19

chauncygu/safe-multi-agent-robosuite

2 papers 15

yannbouteiller/gym-airsimdroneracin…

2 papers 12

CivRealm

pursuitMW

RoomEnv-v0

ColosseumRL

0

The StarCraft Multi-Agent Challenge

Tabish Rashid, Mikayel Samvelyan, C. S. D. Witt, Gregory Farquhar, Jakob N. Foerster, Shimon Whiteson, Philip H. S. Torr, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung•Sun Feb 10 2019

The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.

1131 0

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Tabish Rashid, Mikayel Samvelyan, C. S. D. Witt, Gregory Farquhar, Jakob N. Foerster, Shimon Whiteson•Thu Mar 29 2018

QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.

1846 0

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Yaodong Yang, Jun Wang, J. Kuba, Ruiqing Chen, Munning Wen, Ying Wen, Fanglei Sun•Wed Sep 22 2021

Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, therefore establishing a new state of the art in multi-agent reinforcement learning.

342 0

Value-Decomposition Networks For Cooperative Multi-Agent Learning

V. Zambaldi, Marc Lanctot, T. Graepel, Joel Z. Leibo, A. Gruslys, Max Jaderberg, Wojciech M. Czarnecki, Guy Lever, K. Tuyls, P. Sunehag, Nicolas Sonnerat•Thu Jun 15 2017

This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.

1212 0

RLCard: A Toolkit for Reinforcement Learning in Card Games

D. Zha, Xia Hu, Kwei-Herng Lai, Yuanpu Cao, Songyi Huang, Ruzhe Wei•Wed Oct 09 2019

An overview of the key components in RLCard is provided, a discussion of the design principles, a brief introduction of the interfaces, and comprehensive evaluations of the environments are provided.

67 0

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?

C. S. D. Witt, Shimon Whiteson, Philip H. S. Torr, Viktor Makoviychuk, Tarun Gupta, Denys Makoviichuk, Mingfei Sun•Tue Nov 17 2020

It is demonstrated that Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning.

485 0

Learning with Opponent-Learning Awareness

P. Abbeel, Jakob N. Foerster, Shimon Whiteson, Igor Mordatch, Maruan Al-Shedivat, Richard Y. Chen•Tue Sep 12 2017

LOLA is presented, a method in which each agent shapes the anticipated learning of the other agents in the environment by explicitly considering the learning of the other agent, and by explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest.

584 0

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Pushmeet Kohli, Gregory Farquhar, Jakob N. Foerster, Shimon Whiteson, Philip H. S. Torr, Nantas Nardelli, Triantafyllos Afouras•Mon Feb 27 2017

Two methods using a multi-agent variant of importance sampling to naturally decay obsolete data and conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory enable the successful combination of experience replay with multi- agent RL.

639 0

Adding a benchmark result helps the community track progress.

Multi-agent Reinforcement Learning | State-of-the-Art