[1910.10786] High-Confidence Policy Optimization: Reshaping Ambiguity Sets in Robust MDPs
We derived new sampling guarantees, and our experimental results show that the problem-dependent shapes of the ambiguity set can significantly improve return guarantees

Abstract: Robust MDPs are a promising framework for computing robust policies in
reinforcement learning. Ambiguity sets, which represent the plausible errors in
transition probabilities, determine the trade-off between robustness and
average-case performance. The standard practice of defining ambiguity sets
using the $L_1$ norm leads, unfortunately, to loose and impractical guarantees.
This paper describes new methods for optimizing the shape of ambiguity sets
beyond the $L_1$ norm. We derive new high-confidence sampling bounds for
weighted $L_1$ and weighted $L_\infty$ ambiguity sets and describe how to
compute near-optimal weights from rough value function estimates. Experimental
results on a diverse set of benchmarks show that optimized ambiguity sets
provide significantly tighter robustness guarantees.

$$$$Figure 1: A visualization of ambiguity sets for an MDP in ??. (Finite-Sample Guarantees)Figure 2: Single Bellman Update: guaranteed return for a monotonic value function v = [1, 2, 3, 4, 5]. (Empirical Evaluation)Figure 3: Single Bellman Update: the guaranteed return for a sparse value function v = [0, 0, 0, 0, −5]. (Empirical Evaluation)Figure 6: RiverSwim problem with six states and two actions (left-dashed arrow, right-solid arrow). The agent starts in either s1 or s2. (RiverSwim MDP Graph)›