Policy Search Methods for Robotics


Gerhard Neumann (University of Lincoln)
Jan Peters (TU Darmstadt)


Location: WTC Center  Sao Paolo
Date:  May 8th, afternoon session


Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given policy parametrization. It is well suited for robotics as it can be used with efficient parametric movement representations, can cope with high-dimensional state and action spaces and deal with high-dimensional sensory inputs such as cameras. In the recent years, there has been significant progress in terms of theory and applications. Our tutorial focuses on a unified, information theoretic view on policy search methods and reviews existing algorithms in the light of this framework.

Many current policy search methods rely on stochastic policies to explore the state and action space of the agent. The information contained in such stochastic policies represents our belief of the location of the optimal solution. By using an information-theoretic view on policy search, we can find principled trade-offs for combining information from the current policy with the information contained in the newly experienced samples, also known as exploration-exploitation trade-off in reinforcement learning. Most of the currently used policy search methods can be understood in the light of this information theoretic principles.

Under this framework, we will introduce a taxonomy for policy search methods that distinguishes between episode-based and step-based policy search as well as between model-free and model-based policy search methods. Episode-based methods represent the simplest version of policy search – they directly evaluate the quality of a parameter vector by testing the parameter vector on the real platform. In contrast, step-based approaches decompose the single evaluations, represented as trajectories, into its individual time steps and evaluate the quality of executing actions in a given state. They are typically more difficult to use but can be used for more complex policy classes.

Model-free policy search is a general approach to learn policies based on sampled trajectories. Learning a policy is often easier than learning an accurate forward model. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice. Model-based policy search addresses this problem by first learning a simulator of the robot’s dynamics from data and use itt for policy learning.




geri_bwGerhard Neumann is full professor and chair of Robotics & Autonomous Systems at the University of Lincoln. Before coming to Lincoln, he has been an Assistant Professor at the TU Darmstadt from September 2014 to October 2016 and head of the Computational Learning for Autonomous Systems (CLAS) group. Even earlier, he has been a Post-Doc and Group Leader at the Intelligent Autonomous Systems Group (IAS) also in Darmstadt under the guidance of Prof. Jan Peters. Gerhard obtained his Ph.D. under the supervision of Prof. Wolfgang Mass at the Graz University of Technology.

janJan Peters is a full professor (W3) for Intelligent Autonomous Systems at the Computer Science Department of the Technische Universitaet Darmstadt and at the same time an adjunct senior research scientist at the Max-Planck Institute for Intelligent Systems, where he heads the interdepartmental Robot Learning Group between the departments of Empirical Inference and Autonomous Motion. Jan Peters has received a few awards, most notably, he has received the Dick Volz Best 2007 US PhD Thesis Runner Up Award, the 2012 Robotics: Science & Systems – Early Career Spotlight, the 2013 IEEE Robotics & Automation Society’s Early Career Award, and the 2013 INNS Young Investigator Award. In 2015, he received an ERC Starting Grant.