Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

methodology-2

Thompson Sampling

3260 papers • 126 benchmarks • 313 datasets

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in thompson-sampling-2

Trend

Dataset

Best Model

Actions

No benchmarks available.

Libraries

i

Use these libraries to find thompson-sampling-2 models and implementations

Datasets

No datasets available.

Subtasks

No subtasks available.

Most implemented papers

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Jasper Snoek, G. Tucker, C. Riquelme•Wed Feb 14 2018

This work benchmarks well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems and finds that many approaches that have been successful in the supervised learning setting underperformed in the sequential decision-making scenario.

381

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

0

A Tutorial on Thompson Sampling

Ian Osband, Benjamin Van Roy, Daniel Russo, Abbas Kazerouni•Thu Jul 06 2017

This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, product recommendation, assortment, active learning with neural networks, and reinforcement learning in Markov decision processes.

1107 0

Adapting multi-armed bandits policies to contextual bandits scenarios

David Cortes•Sat Nov 10 2018

This work explores adaptations of successful multi-armed bandits policies to the online contextual bandits scenario with binary rewards using binary classification algorithms such as logistic regression as black-box oracles, resulting in more scalable approaches than previous works, and the ability to work with any type of classification algorithm.

36 0

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

Baekjin Kim, Ambuj Tewari•Sat Nov 30 2019

Two perturbation approaches are investigated to overcome conservatism that optimism based algorithms chronically suffer from in practice and both empirically show the outstanding performance in tackling conservatism issue that Discounted LinUCB (D-LinUCB) struggles with.

20 0

Thompson Sampling Algorithms for Mean-Variance Bandits

Qiuyu Zhu, V. Tan•Fri Jan 31 2020

Thompson Sampling-style algorithms for mean-variance MAB and comprehensive regret analyses for Gaussian and Bernoulli bandits with fewer assumptions are developed and shown to significantly outperform existing LCB-based algorithms for all risk tolerances.

52 0

Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Richard E. Turner, W. Bruinsma, Yann Dubois, Andrew Y. K. Foong, Jonathan Gordon, James Requeima•Wed Jul 01 2020

This work proposes the Convolutional Neural Process (ConvNP), which endows Neural Processes (NPs) with translation equivariance and extends convolutional conditional NPs to allow for dependencies in the predictive distribution, and proposes a new maximum-likelihood objective to replace the standard ELBO objective in NPs.

81 0

Neural Thompson Sampling

Lihong Li, Dongruo Zhou, Quanquan Gu, Weitong Zhang•Thu Oct 01 2020

This paper proposes a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation, with a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network.

145 0

Dynamic slate recommendation with gated recurrent units and Thompson sampling

Simen Eide, D. Leslie, A. Frigessi•Thu Apr 29 2021

A variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user, and which scales to real world industrial situations is introduced.

13 0

Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis

R. Munos, E. Kaufmann, N. Korda•Thu May 17 2012

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem is answered positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret.

604 0

Thompson Sampling for Contextual Bandits with Linear Payoffs

Navin Goyal, Shipra Agrawal•Thu Sep 13 2012

A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.

1090 0

Adding a benchmark result helps the community track progress.

Thompson Sampling | State-of-the-Art