Stick-breaking policy learning in Dec-POMDPs

Miao Liu, Christopher Amato, Xuejun Liao, Lawrence Carin, Jonathan P. How

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from the optimal value. This paper represents the local policy of each agent using variable-sized FSCs that are constructed using a stick-breaking prior, leading to a new framework called decentralized stick-breaking policy representation (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
Original languageEnglish (US)
Title of host publicationIJCAI International Joint Conference on Artificial Intelligence
PublisherInternational Joint Conferences on Artificial IntelligenceThomas.schiex@toulouse.inra.fr
Pages2011-2018
Number of pages8
ISBN (Print)9781577357384
StatePublished - Jan 1 2015
Externally publishedYes

Cite this