Using Inaccurate Models in Reinforcement Learning

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or “simulator”) of the Markov decision process. However, for highdimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively “ground” the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that—when given only a crude model and a small number of real-life trials—our algorithm can obtain near-optimal performance in the real system. Authors: Pieter Abbeel, Morgan Quigley, Andrew Y. Ng (2006)
AUTHORED BY
Pieter Abbeel
Morgan Quigley
Andrew Y. Ng

Abstract

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or “simulator”) of the Markov decision process. However, for highdimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively “ground” the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that—when given only a crude model and a small number of real-life trials—our algorithm can obtain near-optimal performance in the real system.

Download PDF

Related Projects

Leave a Reply

You must be logged in to post a comment