3260 papers • 126 benchmarks • 313 datasets
The target of Multi-agent Reinforcement Learning is to solve complex problems by integrating multiple agents that focus on different sub-tasks. In general, there are two types of multi-agent systems: independent and cooperative systems. Source: Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-Ray Reports
(Image credit: Papersgraph)
These leaderboards are used to track progress in multi-agent-reinforcement-learning-8
Use these libraries to find multi-agent-reinforcement-learning-8 models and implementations
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
The StarCraft Multi-Agent Challenge (SMAC), based on the popular real-time strategy game StarCraft II, is proposed as a benchmark problem and an open-source deep multi-agent RL learning framework including state-of-the-art algorithms is opened.
QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations, and structurally enforce that the joint-action value is monotonic in the per- agent values, which allows tractable maximisation of the jointaction-value in off-policy learning.
Results show that HATRPO and HAPPO significantly outperform strong baselines such as IPPO, MAPPO and MADDPG on all tested tasks, therefore establishing a new state of the art in multi-agent reinforcement learning.
This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.
An overview of the key components in RLCard is provided, a discussion of the design principles, a brief introduction of the interfaces, and comprehensive evaluations of the environments are provided.
It is demonstrated that Independent PPO (IPPO), a form of independent learning in which each agent simply estimates its local value function, can perform just as well as or better than state-of-the-art joint learning approaches on popular multi-agent benchmark suite SMAC with little hyperparameter tuning.
LOLA is presented, a method in which each agent shapes the anticipated learning of the other agents in the environment by explicitly considering the learning of the other agent, and by explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest.
Two methods using a multi-agent variant of importance sampling to naturally decay obsolete data and conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory enable the successful combination of experience replay with multi- agent RL.
Adding a benchmark result helps the community track progress.