Gym-µRTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning (2021-05-22T00:00:00.000000Z)

TL;DR

Gym-JLRTS (pronounced “gym-micro-RTS”) is introduced as a fast-to-run RL environment for full-game RTS research and a collection of techniques to scale DRL to play full- game µRTS as well as ablation studies to demonstrate their empirical importance.

Abstract

In recent years, researchers have achieved great success in applying Deep Reinforcement Learning (DRL) algorithms to Real-time Strategy (RTS) games, creating strong autonomous agents that could defeat professional players in StarCraft II. However, existing approaches to tackle full games have high computational costs, usually requiring the use of thousands of GPUs and CPUs for weeks. This paper has two main contributions to address this issue: 1) We introduce Gym-JLRTS (pronounced “gym-micro-RTS”) as a fast-to-run RL environment for full-game RTS research and 2) we present a collection of techniques to scale DRL to play full-game µRTS as well as ablation studies to demonstrate their empirical importance. Our best-trained bot can defeat every µRTS bot we tested from the past µRTS competitions when working in a single-map setting, resulting in a state-of-the-art DRL agent while only taking about 60 hours of training using a single machine (one GPU, three vCPU. 16GB RAM).

Authors

Chris Bamford

2 papers

Sheng-Jun Huang

2 papers

Santiago Ontañón

4 papers

TL;DR

Abstract

Authors

References34 items

TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Implementation Matters in Deep RL: A Case Study on PPO and TRPO

Action Space Shaping in Deep Reinforcement Learning

Dota 2 with Large Scale Deep Reinforcement Learning

Grandmaster level in StarCraft II using multi-agent reinforcement learning

Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI

The StarCraft Multi-Agent Challenge

Modular Architecture for StarCraft II with Deep Reinforcement Learning

TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game

Deep RTS: A Game Environment for Deep Reinforcement Learning in Real-Time Strategy Games

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

StarCraft II: A New Challenge for Reinforcement Learning

Proximal Policy Optimization Algorithms

ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

Identity Mappings in Deep Residual Networks

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Adam: A Method for Stochastic Optimization

Script- and cluster-based UCT for StarCraft

Playing Atari with Deep Reinforcement Learning

A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft

The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games

Fast Heuristic Search for RTS Game Combat Scenarios

CLASSQ-L: A Q-Learning Algorithm for Adversarial Real-Time Strategy Games

ON‐LINE CASE‐BASED PLANNING

Case-Based Reasoning for Build Order in Real-Time Strategy Games

UCT for Tactical Assault Planning in Real-Time Strategy Games

Real-Time Strategy Games: A New AI Research Challenge

Openai baselines

Apocrita - High Performance Computing Cluster for Queen Mary University of London , Mar

PPO implementation of the openai/baselines also uses mini-batches to compute the gradient and update the policy instead of the whole batch data such as in open/spinningup

The Epsilon Parameter of Adam Optimizer 21 : By default

Sharing Hidden Layers for Policy and Value

Field of Study

Journal Information

Name

Page