Skip to main content

Menu

Enter the terms you wish to search for.

Faculty & Research

Faculty
Publications
Books
Working Papers
Case Studies
Postdoctoral Scholars
Research Labs & Initiatives
Behavioral Lab
Data, Analytics & Research Computing

Faculty
Publications
Books
Working Papers
Case Studies
Research Labs & Initiatives
Behavioral Lab
DARC

Faculty & Research Publications Technical Note — The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

Technical Note — The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

By Nima HamidiMohsen Bayati

Operations Research

July2023 Vol. 71 Issue 4 Pages 1021–1439.

Operations, Information & Technology

View Publication

In this note, we introduce a general version of the well-known elliptical potential lemma that is a widely used technique in the analysis of algorithms in sequential learning and decision-making problems. We consider a stochastic linear bandit setting where decision makers sequentially choose among a set of given actions, observe their noisy rewards, and aim to maximize their cumulative expected reward over a decision-making horizon. The elliptical potential lemma is a key tool for quantifying uncertainty in estimating parameters of the reward function, but it requires the noise and the prior distributions to be Gaussian. Our general elliptical potential lemma relaxes this Gaussian requirement, which is a highly nontrivial extension for a number of reasons; unlike the Gaussian case, there is no closed-form solution for the covariance matrix of the posterior distribution, the covariance matrix is not a deterministic function of the actions, and the covariance matrix is not decreasing with respect to the semidefinite inequality. Although this result is of broad interest, we showcase an application of it to prove an improved Bayesian regret bound for the well-known Thompson sampling algorithm in stochastic linear bandits with changing action sets where prior and noise distributions are general. This bound is minimax optimal up to constants.

Related

Mohsen Bayati

Mohsen Bayati

Professor, Operations, Information & Technology

655 Knight Way
Stanford, CA 94305
USA

Footer contact links

Contact Us
Visit Us
Stay In Touch

Follow Us

Twitter
Facebook
Instagram
YouTube
LinkedIn

Footer 1

Companies, Organizations & Recruiters
Stanford Community
Newsroom

Footer 2

Library
Jobs
MyGSB

© Stanford Graduate School of Business

Footer legal links

Accessibility
Non-Discrimination Policy
Privacy Policy
Terms of Use
Stanford University