Incremental least squares policy iteration for POMDPs

Hui Li, Xuejun Liao, Lawrence Carin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by minimizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of optimal basis functions are determined by the algorithm to minimize the Bellman residual incrementally, via efficient computations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.
Original languageEnglish (US)
Title of host publicationProceedings of the National Conference on Artificial Intelligence
Pages1167-1172
Number of pages6
StatePublished - Nov 13 2006
Externally publishedYes

Fingerprint Dive into the research topics of 'Incremental least squares policy iteration for POMDPs'. Together they form a unique fingerprint.

Cite this