Applying Online-search to Reinforcement Learning

In reinforcement learning it is frequently necessary to resort to an approximation to the true optimal value function. Here we investigate the benefits of online search in such cases. We examine "local" searches, where the agent performs a finite-depth lookahead search, and "global" searches, where the agent performs a search for a trajectory all the way from the current state to a goal state. The key to the success of these methods lies in taking a value function, which gives a rough solution to the hard problem of finding good trajectories from every single state, and combining that with online search, which then gives an accurate solution to the easier problem of finding a good trajectory specifically from the current state. Authors: Scott Davies, Andrew Y. Ng, Andrew Moore (1998)
AUTHORED BY
Scott Davies
Andrew Y. Ng
Andrew Moore

Abstract

In reinforcement learning it is frequently necessary to resort to an approximation to the true optimal value function. Here we investigate the benefits of online search in such cases. We examine "local" searches, where the agent performs a finite-depth lookahead search, and "global" searches, where the agent performs a search for a trajectory all the way from the current state to a goal state. The key to the success of these methods lies in taking a value function, which gives a rough solution to the hard problem of finding good trajectories from every single state, and combining that with online search, which then gives an accurate solution to the easier problem of finding a good trajectory specifically from the current state.

Download PDF

Related Projects

Leave a Reply

You must be logged in to post a comment