3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in offline-rl-1
No benchmarks available.
Use these libraries to find offline-rl-1 models and implementations
No datasets available.
No subtasks available.
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.
It is shown that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.
This paper proposes a benchmark called RL Unplugged to evaluate and compare offline RL methods, and proposes detailed evaluation protocols for each domain in RL Unplugged and provides an extensive analysis of supervised learning and offline RL methods using these protocols.
This work presents a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio).
It is demonstrated that recent off-policy deep RL algorithms, even when trained solely on this replay dataset, outperform the fully trained DQN agent and Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates is presented.
Adding a benchmark result helps the community track progress.