An algorithm for rapidly learning neural network policies for robotics systems that follows the model-based reinforcement learning paradigm and improves upon existing algorithms: PILeO and a sample-based version of PILeo with neural network dynamics (Deep-PILeO).
Ahstract- We present an algorithm for rapidly learning neural network policies for robotics systems. The algorithm follows the model-based reinforcement learning paradigm and improves upon existing algorithms: PILeO and a sample-based version of PILeo with neural network dynamics (Deep-PILeO). To improve convergence, we propose a model-based algorithm that uses fixed random numbers and clips gradients during optimization. We propose training a neural network dynamics model using variational dropout with truncated Log-Normal noise. These improvements enable data-efficient synthesis of complex neural network policies. We test our approach on a variety of benchmark tasks, demonstrating data-efficiency that is competitive with that of PILeO, while being able to optimize complex neural network controllers. Finally, we assess the performance of the algorithm for learning motor controllers for a six legged autonomous underwater vehicle. This demonstrates the potential of the algorithm for scaling up the dimensionality and dataset sizes, in more complex tasks.