Item Details

Expectation-maximization for Bayes-adaptive POMDPs

Vargo, Erik
Thesis/Dissertation; Online
Vargo, Erik
Cogill, Randy
Partially observable Markov decision processes, or POMDPs, are used extensively in modeling the complex interactions between an agent and a dynamic, stochastic environment. When all model parameters are known, near-optimal solutions to the reward maximization problem can be obtained through approximate value iteration. Unfortunately, in many real-world applications a POMDP formulation may not be justified due to uncertainty in the underlying hidden Markov model parameters. However, if model uncertainty can be characterized by a prior distribution over the state-transition and observation-emission probabilities, it is natural to seek Bayes optimal policies which maximize the expected reward subject to this distribution. The coupling of a POMDP with a model prior was recently formalized as the Bayes-adaptive POMDP (BAPOMDP) and various online and offline algorithms have since been proposed for this class of problems, the most popular of which are inspired by approximate POMDP value iteration. Despite its success when applied to small benchmark BAPOMDPs, empirical results suggest that value iteration may be inadequate as the degree of model uncertainty increases. As an alternative, in this dissertation we explore expectation-maximization approaches to solving BAPOMDPs, which have the potential to scale more gracefully with both the number of uncertain model parameters and their assumed variability.
University of Virginia, Department of Systems Engineering, PHD (Doctor of Philosophy), 2013
Published Date
PHD (Doctor of Philosophy)
Libra ETD Repository
Logo for In CopyrightIn Copyright


Read Online