Counting to Explore and Generalize in Text-based Games (2018-06-29T00:00:00.000000Z)

TL;DR

A recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments and observes that the agent learns policies that generalize to unseen games of greater difficulty.

Abstract

We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments. We show promising results on a set of generated text-based games of varying difficulty where the goal is to collect a coin located at the end of a chain of rooms. In contrast to previous text-based RL approaches, we observe that our agent learns policies that generalize to unseen games of greater difficulty.

Authors

Adam Trischler

11 papers

Alessandro Sordoni

8 papers

Rémi Tachet des Combes

2 papers

Counting to Explore and Generalize in Text-based Games

TL;DR

Abstract

Authors

References24 items

TextWorld: A Learning Environment for Text-based Games

Automatic differentiation in PyTorch

Count-Based Exploration in Feature Space for Reinforcement Learning

Parameter Space Noise for Exploration

What Can You Do with a Rock? Affordance Extraction via Word Embeddings

Count-Based Exploration with Neural Density Models

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Playing FPS Games with Deep Reinforcement Learning

Deep Exploration via Bootstrapped DQN

Deep Reinforcement Learning with a Natural Language Action Space

Deep Recurrent Q-Learning for Partially Observable MDPs

Language Understanding for Text-based Games using Deep Reinforcement Learning

Adam: A Method for Stochastic Optimization

Near-Bayesian exploration in polynomial time

An analysis of model-based Interval Estimation for Markov Decision Processes

Near-Optimal Reinforcement Learning in Polynomial Time

Planning and Acting in Partially Observable Stochastic Domains

Long Short-Term Memory

We use adam (Kingma & Ba, 2014) as the step rule for optimization. The learning rate is 1 e − 3 . The model is implemented using PyTorch

R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning

11: Text the agent gets to observe for one of the level 10 easy games

A PyTorch implementation of the proposed method

When zero-shot evaluating hard games, we use max train step = 100

Field of Study

Journal Information

Name

Volume

Venue Information

Name

Type

URL

Alternate Names