Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis (2012-05-18T00:00:00.000000Z)

TL;DR

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem is answered positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret.

Abstract

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.

Authors

R. Munos

12 papers

E. Kaufmann

1 papers

N. Korda

1 papers

Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis

TL;DR

Abstract

Authors

References14 items

On Bayesian Upper Confidence Bounds for Bandit Problems

An Empirical Evaluation of Thompson Sampling

Deviations of Stochastic Bandit Regret

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond

Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton

Regret Bounds and Minimax Policies under Partial Monitoring

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Finite-time Analysis of the Multiarmed Bandit Problem

ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES

Submitted to the Annals of Applied Probability OPTIMISTIC BAYESIAN SAMPLING IN CONTEXTUAL-BANDIT PROBLEMS ∗ By

An Asymptotically Optimal Bandit Algorithm for Bounded Support Models.

25th Annual Conference on Learning Theory Analysis of Thompson Sampling for the Multi-armed Bandit Problem

Asymptotically Efficient Adaptive Allocation Rules

Field of Study

Venue Information

Name

Type

URL

Alternate Names