[1910.11024] Simple Strategies in Multi-Objective MDPs (Technical Report)
We presented an MILP encoding to find optimal pure and stationary strategies on an MDP in which a memory structure can be incorporated
Abstract: We consider the verification of multiple expected reward objectives at once
on Markov decision processes (MDPs). This enables a trade-off analysis among
multiple objectives by obtaining the Pareto front. We focus on strategies that
are easy to employ and implement. That is, strategies that are pure (no
randomization) and have bounded memory. We show that checking whether a point
is achievable by a pure stationary strategy is NP-complete, even for two
objectives, and we provide an MILP encoding to solve the corresponding problem.
The bounded memory case can be reduced to the stationary one by a product
construction. Experimental results using \Storm and Gurobi show the feasibility
of our algorithms.
‹Figure 1: MDP where double circle (resp. square) states belong to goal set G (resp. G) and a plot of pure stationary and general Pareto fronts for reachability objectives. (Objectives)Figure 2: MILP encoding for a PSMA instance with unichain MDP and finite reward. (Unichain MDP and Finite Rewards)Figure 3: MILP encoding for total reward objectives. (Alternative Encoding for Total Rewards)Figure 4: (a) Multichain MDP. (b) MDP where pure strategies use nontrivial memory for multiple reachability objectives. Double circle (resp. square) states belong to goal set G (resp. G). (Extension to Multichain MDP)Figure 5: MILP encoding for detection of end components. (Extension to Multichain MDP)Figure 11: Comparison of the two encodings (left) and impact of memory usage (right). (Evaluation)Figure 12: MILP encoding for total reward objectives. (Extensions for Alternative Encoding)›