Continuous control with deep reinforcement learning

Published in

International Conference on Learning Representa...(2015)

External Links:

Generate Graph

TL;DR

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Abstract

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Authors

A. Pritzel

5 papers

N. Heess

9 papers

Daan Wierstra

12 papers

References35 items

Memory-based control with recurrent neural networks

Learning Continuous Control Policies by Stochastic Value Gradients

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

What Is Machine Learning

Gradient Estimation Using Stochastic Computation Graphs

Continuous control with deep reinforcement learning

Published in

International Conference on Learning Representa...(2015)

External Links:

Generate Graph

TL;DR

Abstract

Authors

A. Pritzel

5 papers

N. Heess

9 papers

Daan Wierstra

12 papers

References35 items

Memory-based control with recurrent neural networks

Learning Continuous Control Policies by Stochastic Value Gradients

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

What Is Machine Learning

Gradient Estimation Using Stochastic Computation Graphs

David Silver

21 papers

T. Lillicrap

23 papers

Jonathan J. Hunt

1 papers

Yuval Tassa

5 papers

End-to-End Training of Deep Visuomotor Policies

Human-level control through deep reinforcement learning

Trust Region Policy Optimization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

From Pixels to Torques: Policy Learning with Deep Dynamical Models

Adam: A Method for Stochastic Optimization

Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning

Evolving deep unsupervised convolutional networks for vision-based reinforcement learning

Deterministic Policy Gradient Algorithms

Playing Atari with Deep Reinforcement Learning

A Survey on Policy Search for Robotics

Autonomous reinforcement learning with experience replay.

MuJoCo: A physics engine for model-based control

Synthesis and stabilization of complex behaviors through online trajectory optimization

ImageNet classification with deep convolutional neural networks

Reinforcement learning in feedback control

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Double Q-learning

States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning

Real-time reinforcement learning by sequential Actor-Critics and experience replay

A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems

Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS

Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks

Control Policy with Autocorrelated Noise in Reinforcement Learning for Robotics

Deep sparse rectiﬁer networks

Adaptive critic designs

On the theory of brownian motion

1930) with θ = 0.15 and σ = 0.3. The Ornstein-Uhlenbeck process models the velocity

Agent should move forward as quickly as possible with a bipedal walker constrained to the plane without falling down or pitching the torso too far forward or backward

Field of Study

Computer ScienceMathematics

Journal Information

Name

arXiv: Learning

Venue Information

Name

International Conference on Learning Representations

Type

conference

URL

https://iclr.cc/

Alternate Names

Int Conf Learn Represent
ICLR