An Optimistic Perspective on Offline Reinforcement Learning (2019-07-10T00:00:00.000000Z)

TL;DR

It is demonstrated that recent off-policy deep RL algorithms, even when trained solely on this replay dataset, outperform the fully trained DQN agent and Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates is presented.

Authors

Mohammad Norouzi

14 papers

Rishabh Agarwal

4 papers

D. Schuurmans

1 papers

An Optimistic Perspective on Offline Reinforcement Learning

TL;DR

Authors

Field of Study

Venue Information

Name

Type

URL

Alternate Names