Optimal and Adaptive Off-policy Evaluation in Contextual Bandits (2016-12-04T00:00:00.000000Z)