Dota 2 with Large Scale Deep Reinforcement Learning (2019-12-13T00:00:00.000000Z)

TL;DR

By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

Abstract

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

Authors

I. Sutskever

26 papers

Christopher Berner

3 papers

Greg Brockman

5 papers

TL;DR

Abstract

Authors

References50 items

The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

Grandmaster level in StarCraft II using multi-agent reinforcement learning

Solving Rubik's Cube with a Robot Hand

Open-ended Learning in Symmetric Zero-sum Games

An Empirical Model of Large-Batch Training

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Quantifying Generalization in Reinforcement Learning

Exploration by Random Network Distillation

Human-level performance in 3D multiplayer games with population-based reinforcement learning

Mix&Match - Agent Curricula for Reinforcement Learning

Distributed Prioritized Experience Replay

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Mastering the game of Go without human knowledge

Emergent Complexity via Multi-Agent Competition

Scaling SGD Batch Size to 32K for ImageNet Training

Growing a Brain: Fine-Tuning by Increasing Model Capacity

Proximal Policy Optimization Algorithms

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Thinking Fast and Slow with Deep Learning and Tree Search

A Deep Reinforced Model for Abstractive Summarization

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

DeepStack: Expert-level artificial intelligence in heads-up no-limit poker

Learning without Forgetting

OpenAI Gym

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Asynchronous Methods for Deep Reinforcement Learning

Mastering the game of Go with deep neural networks and tree search

Net2Net: Accelerating Learning via Knowledge Transfer

High-Dimensional Continuous Control Using Generalized Advantage Estimation

A comparative study of visual and auditory reaction times on the basis of gender and physical activity levels of medical first year students

Distilling the Knowledge in a Neural Network

Adam: A Method for Stochastic Optimization

Playing Atari with Deep Reinforcement Learning

Guided Policy Search

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Learning to Forget: Continual Prediction with LSTM

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play

An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories

GPU Kernels for Block-Sparse Weights

TrueSkill™: A Bayesian Skill Rating System

Actor-Critic Algorithms

Searching for solutions in games and artificial intelligence

ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION

Learning Dexterity https : / / openai . com / blog / learning -dexterity

Montezuma’s revenge solved by go-explore, a new algorithm for hard-exploration problems (sets records on pitfall too)

Library (NCCL) https://developer

Superhuman AI for multiplayer

The Free Encyclopedia https: //en.wikipedia.org/w/index.php?title=The_International_2018&oldid= 912865272

Field of Study

Journal Information

Name

Volume

Venue Information

Name

Type

URL

Alternate Names