We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision processes (POMDPs). The ILSPI algorithm computes a basis representation of the infinite-horizon value function by minimizing the square of Bellman residual and performs policy improvement in reachable belief states. A number of optimal basis functions are determined by the algorithm to minimize the Bellman residual incrementally, via efficient computations. We show that, by using optimally determined basis functions, the policy can be improved successively on a set of most probable belief points sampled from the reachable belief set. As the ILSPI is based on belief sample points, it represents a point-based policy iteration method. The results on four benchmark problems show that the ILSPI compares competitively to its value-iteration counterparts in terms of both performance and computational efficiency. Copyright © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.
|Original language||English (US)|
|Title of host publication||Proceedings of the National Conference on Artificial Intelligence|
|Number of pages||6|
|State||Published - Nov 13 2006|