Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Published in

International Conference on Machine Learning(2016)

External Links:

Generate Graph

TL;DR

The SWITCH estimator is proposed, which can use an existing reward model to achieve a better bias-variance tradeoff than IPS and DR and prove an upper bound on its MSE and demonstrate its benefits empirically on a diverse collection of data sets, often outperforming prior work by orders of magnitude.

Abstract

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model. We consider the general (agnostic) setting without access to a consistent model of rewards and establish a minimax lower bound on the mean squared error (MSE). The bound is matched up to constants by the inverse propensity scoring (IPS) and doubly robust (DR) estimators. This highlights the difficulty of the agnostic contextual setting, in contrast with multi-armed bandits and contextual bandits with access to a consistent reward model, where IPS is suboptimal. We then propose the SWITCH estimator, which can use an existing reward model (not necessarily consistent) to achieve a better bias-variance tradeoff than IPS and DR. We prove an upper bound on its MSE and demonstrate its benefits empirically on a diverse collection of data sets, often outperforming prior work by orders of magnitude.

Authors

Alekh Agarwal

4 papers

Yu-Xiang Wang

4 papers

Miroslav Dudík

2 papers

References32 items

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Toward Minimax Off-policy Value Estimation

Doubly Robust Policy Evaluation and Optimization

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

Published in

International Conference on Machine Learning(2016)

External Links:

Generate Graph

TL;DR

Abstract

Authors

Alekh Agarwal

4 papers

Yu-Xiang Wang

4 papers

Miroslav Dudík

2 papers

References32 items

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Toward Minimax Off-policy Value Estimation

Doubly Robust Policy Evaluation and Optimization

Counterfactual reasoning and learning systems: the example of computational advertising

Improved double-robust estimation in missing data and causal inference models.

Doubly Robust Policy Evaluation and Learning

Learning Bounds for Importance Weighting

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Covariate Shift by Kernel Mean Matching

Doubly Robust Estimation in Missing Data and Causal Inference Models

Mean-Squared-Error Calculations for Average Treatment Effects

Asymptotically exact minimax estimation in sup-norm for anisotropic Hölder classes

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score

On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects

Semiparametric Efficiency in Multivariate Regression Models with Missing Data

Statistical Analysis With Missing Data

Statistical Analysis with Missing Data

Statistics and Causal Inference

Some results on generalized difference estimation and generalized regression estimation for finite populations

A Generalization of Sampling Without Replacement from a Finite Universe

Doubly Robust Off-policy Evaluation for Reinforcement Learning

URL http://www.stat.cmu.edu/ ~larry/=sml/Minimax.pdf

Data-adaptive selection of the truncation level for Inverse-Probability-of-Treatment-Weighted estimators

The exact law of large numbers via Fubini extension and characterization of insurable risks

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1

Probability Inequalities for Sums of Bounded Random Variables

Weighting adjustment for unit nonresponse. Incomplete data in sample surveys

Sur les fonctions d'ensemble additives et continues

the corresponding law of large numbers

Field of Study

Computer ScienceMathematics

Journal Information

Name

ArXiv

Volume

abs/2005.00687

Venue Information

Name

International Conference on Machine Learning

Type

conference

URL

https://icml.cc/

Alternate Names

ICML
Int Conf Mach Learn

TL;DR

Abstract

Authors

References32 items

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Toward Minimax Off-policy Value Estimation

Doubly Robust Policy Evaluation and Optimization

TL;DR

Abstract

Authors

References32 items

The Value of Knowing the Propensity Score for Estimating Average Treatment Effects

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Toward Minimax Off-policy Value Estimation

Doubly Robust Policy Evaluation and Optimization

Counterfactual reasoning and learning systems: the example of computational advertising

Improved double-robust estimation in missing data and causal inference models.

Doubly Robust Policy Evaluation and Learning

Learning Bounds for Importance Weighting

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Covariate Shift by Kernel Mean Matching

Doubly Robust Estimation in Missing Data and Causal Inference Models

Mean-Squared-Error Calculations for Average Treatment Effects

Asymptotically exact minimax estimation in sup-norm for anisotropic Hölder classes

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score

On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects

Semiparametric Efficiency in Multivariate Regression Models with Missing Data

Statistical Analysis With Missing Data

Statistical Analysis with Missing Data

Statistics and Causal Inference

Some results on generalized difference estimation and generalized regression estimation for finite populations

A Generalization of Sampling Without Replacement from a Finite Universe

Doubly Robust Off-policy Evaluation for Reinforcement Learning

URL http://www.stat.cmu.edu/ ~larry/=sml/Minimax.pdf

Minimax theory

Data-adaptive selection of the truncation level for Inverse-Probability-of-Treatment-Weighted estimators

The exact law of large numbers via Fubini extension and characterization of insurable risks

Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score 1

Probability Inequalities for Sums of Bounded Random Variables

Weighting adjustment for unit nonresponse. Incomplete data in sample surveys

Sur les fonctions d'ensemble additives et continues

the corresponding law of large numbers

Field of Study

Journal Information

Name

Volume

Venue Information

Name

Type

URL

Alternate Names