Randomized Exploration for Non-Stationary Stochastic Linear Bandits (2019-12-01T00:00:00.000000Z)

TL;DR

Two perturbation approaches are investigated to overcome conservatism that optimism based algorithms chronically suffer from in practice and both empirically show the outstanding performance in tackling conservatism issue that Discounted LinUCB (D-LinUCB) struggles with.

Abstract

We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice. The first approach replaces optimism with a simple randomization when using confidence sets. The second one adds random perturbations to its current estimate before maximizing the expected reward. For non-stationary linear bandits, where each action is associated with a $d$-dimensional feature and the unknown parameter is time-varying with total variation $B_T$, we propose two randomized algorithms, Discounted Randomized LinUCB (D-RandLinUCB) and Discounted Linear Thompson Sampling (D-LinTS) via the two perturbation approaches. We highlight the statistical optimality versus computational efficiency trade-off between them in that the former asymptotically achieves the optimal dynamic regret $\tilde{\mathcal{O}}( d ^{2/3}B_T^{1/3} T^{2/3})$, but the latter is oracle-efficient with an extra logarithmic gap in number of arms compared to minimax-optimal dynamic regret. In a simulation study, both empirically show the outstanding performance in tackling conservatism issue that Discounted LinUCB (D-LinUCB) struggles with.

Authors

Baekjin Kim

1 papers

Ambuj Tewari

1 papers

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

TL;DR

Abstract

Authors

References27 items

A Simple Approach for Non-stationary Linear Bandits

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

Recent Advances in Multiarmed Bandits for Sequential Decision Making

Weighted Linear Bandits for Non-Stationary Environments

Randomized Exploration in Generalized Linear Bandits

Hedging the Drift: Learning to Optimize under Non-Stationarity

Perturbed-History Exploration in Stochastic Linear Bandits

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems

Learning Contextual Bandits in a Non-stationary Environment

Efficient Contextual Bandits in Non-stationary Worlds

Attribution Modeling Increases Efficiency of Bidding in Display Advertising

Linear Thompson Sampling Revisited

Fighting Bandits with a New Kind of Smoothness

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

Online Linear Optimization via Smoothing

Thompson Sampling for Contextual Bandits with Linear Payoffs

Improved Algorithms for Linear Stochastic Bandits

An Empirical Evaluation of Thompson Sampling

Efficient algorithms for online decision problems

Reinforcement Learning with Immediate Rewards and Linear Hypotheses

Asymptotically efficient adaptive allocation rules

4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY

Non-stationary Linear Bandits Revisited

Handbook Of Mathematical Functions

Stochastic Linear Optimization under Bandit Feedback

ban-dit feedback

Field of Study

Venue Information

Name

Type

URL

Alternate Names