Point-Based Value Iteration for Constrained POMDPs
Dongho Kim, Jaesong Lee, Kee-Eung Kim and Pascal Poupart
Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the specification of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problem settings with limited resource or multiple objectives. In this paper, we show that optimal policies in CPOMDPs can be non-deterministic, and present exact and approximate dynamic programming methods for computing randomized optimal policies in CPOMDPs. Whereas the exact method involves solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update requiring a linear program (LP) instead. We show that randomized policies are significantly better than the deterministic ones, and the approximate method leverages the scalability of point-based methods in standard POMDPs.