Online expectation maximization for reinforcement learning in POMDPs

Miao Liu, Xuejun Liao, Lawrence Carin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closedform. As a result, the online algorithm has a time complexity O (n) and a memory complexity O(1), compared to O (n2) and O(n) for the corresponding batch-mode algorithm, where n is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.
Original languageEnglish (US)
Title of host publicationIJCAI International Joint Conference on Artificial Intelligence
Pages1501-1507
Number of pages7
StatePublished - Dec 1 2013
Externally publishedYes

Fingerprint Dive into the research topics of 'Online expectation maximization for reinforcement learning in POMDPs'. Together they form a unique fingerprint.

Cite this