Suboptimal Policies, with Bounds, for Parameter Adaptive Decision Processes

By William Lovejoy
1995| Working Paper No. 1105

A parameter adaptive decision process is a sequential decision process where some parameter or parameter set impacting the rewards and/or transitions of the process is not known with certainty. Signals from the performance of the system can be processed by the decision maker as time progresses, yielding information regarding which parameter set is operative. Active learning is an essential feature of these processes, and the decision maker must choose actions that simultaneously &de the system in a preferred direction, as well as yield quality information that can be used to better prescribe future actions. If the operative parameter set is known with certainty, the parameter adaptive problem reduces to a conventional stochastic dynamic program, which is presumed solvable. This paper shows how to use the solutions to these tractable, conventional dynamic programs to derive suboptimal policies with performance bounds for the parameter adaptive problem. An example inventory stocking problem demonstrates the technique.