3260 papers • 126 benchmarks • 313 datasets
Continuous control in the context of playing games, especially within artificial intelligence (AI) and machine learning (ML), refers to the ability to make a series of smooth, ongoing adjustments or actions to control a game or a simulation. This is in contrast to discrete control, where the actions are limited to a set of specific, distinct choices. Continuous control is crucial in environments where precision, timing, and the magnitude of actions matter, such as driving a car in a racing game, controlling a character in a simulation, or managing the flight of an aircraft in a flight simulator.
(Image credit: Papersgraph)
These leaderboards are used to track progress in continuous-control-2
Use these libraries to find continuous-control-2 models and implementations
This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.
Dreamer is presented, a reinforcement learning agent that solves long-horizon tasks purely by latent imagination and efficiently learn behaviors by backpropagating analytic gradients of learned state values through trajectories imagined in the compact state space of a learned world model.
This work introduces a random search method for training static, linear policies for continuous control problems, matching state-of-the-art sample efficiency on the benchmark MuJoCo locomotion tasks.
A suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware and following a Multi-Goal Reinforcement Learning (RL) framework are introduced.
This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure.
Conservative Q-learning (CQL) is proposed, which aims to address limitations of offline RL methods by learning a conservative Q-function such that the expected value of a policy under this Q- function lower-bounds its true value.
This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias.
Adding a benchmark result helps the community track progress.