policytree: Policy Learning via Doubly Robust Empirical Welfare Maximization over Trees

By Erik SverdrupAyush KanodiaZhengyuan ZhouSusan AtheyStefan Wager

The Journal of Open Source Software

June2020 Vol. 5 Issue 50

View Publication

The problem of learning treatment assignment policies from randomized or observational data arises in many fields. For example, in personalized medicine, we seek to map patient observables (like age, gender, heart pressure, etc.) to a treatment choice using a data-driven rule. There has recently been a considerable amount of work on statistical methodology for policy learning, including Manski (2004), Zhao, Zeng, Rush, & Kosorok (2012), Swaminathan & Joachims (2015), Kitagawa & Tetenov (2018), van der Laan & Luedtke (2015), Luedtke & van der Laan (2016), Mbakop & Tabord-Meehan (2016), Athey & Wager (2017), Kallus & Zhou (2018), and Zhou, Athey, & Wager (2018). In particular, Kitagawa & Tetenov (2018) show that if we only consider policies π restricted to a class II with finite VC dimension and have access to data from a randomized trial with n samples, then an empirical welfare maximization algorithm achieves regret that scales as √VC(II)/n. Athey & Wager (2017) extend this result to observational studies via doubly robust scoring, and Zhou et al. (2018) further consider the case with multiple treatment choices (in particular, the regret will depend on the tree depth, feature space, and number of actions).

The package policytree for R (R Core Team, 2020) implements the multi-action doubly robust approach of Zhou et al. (2018) in the case where we want to learn policies π that belong to the class II of depth-k decision trees. In order to use policytree, the user starts by specifying a set of doubly robust scores for policy evaluation; the software then carries out globally optimal weighted search over decision trees.

It is well known that finding an optimal tree of arbitrary depth is NP-hard. However, if we restrict our attention to trees of depth k, then the problem can be solved in polynomial time. Here, we implement the global optimization via an exhaustive (unconstrained) tree search that runs in O(PkNk (log N + D) + PN log N) time, where N is the number of individuals, P the number of characteristics observed for each individual and D is the number of available treatment choices (see details below). If an individual’s characteristics only takes on a few discrete values, the runtime can be reduced by a factor of Nk. Additionally, an optional approximation parameter lets the user control how many splits to consider.