Skip to main content

Menu

The Experience
About Stanford GSB
About Us

The Leadership

Dean’s Updates

School News & History

Commencement

Voices
Business, Government & Society

Centers & Institutes

Center for Entrepreneurial Studies

Center for Social Innovation

Stanford Seed
About the Experience
Learning at Stanford GSB

Experiential Learning

Guest Speakers

Entrepreneurship

Leadership

Social Innovation

Communication
Life at Stanford GSB

Collaborative Environment

Activities & Organizations

Student Services

Housing Options

International Students
The Programs
Full-Time Degree Programs
MBA

Why Stanford MBA

Academic Experience

Admission

Financial Aid
MSx

Why Stanford MSx

Curriculum

Admission

Financial Aid
PhD

Academic Experience

Admission

Financial Aid

Research Fellows Program
See All Programs
Non-Degree & Certificate Programs
Executive Education

Stanford Executive Program

Programs for Organizations

The Difference

Admission
Online Programs

Stanford LEAD
Stanford Seed

Seed Transformation Program

Aspire Program

Seed Spark Program
Faculty & Research
Faculty
Faculty Profiles

Academic Areas

Awards & Honors

Seminars

Conferences

Voices
Faculty Research
Publications

Working Papers

Case Studies

Books
Research Hub
Research Labs & Initiatives

Business Library

Data, Analytics & Research Computing

Behavioral Lab
Research Labs
Cities, Housing & Society Lab

Golub Capital Social Impact Lab
Research Initiatives
Corporate Governance Research Initiative

Corporations and Society Initiative

Policy and Innovation Initiative

Rapid Decarbonization Initiative
Stanford Latino Entrepreneurship Initiative

Value Chain Innovation Initiative

Venture Capital Initiative
Insights
Topics
Accounting

Career & Success

Climate & Sustainability

Corporate Governance

Culture & Society

Economics

Education

Entrepreneurship
Finance & Investing

Government & Politics

Healthcare

Innovation

Leadership & Management

Marketing

Markets & Trade

Nonprofit
Operations & Logistics

Opportunity & Access

Organizational Behavior

Political Economy

Social Impact

Technology & AI
Opinion & Analysis

Magazine

Podcasts

Email Newsletter
Alumni
Welcome, Alumni
Communities

Digital Communities & Tools

Regional Chapters

Women’s Programs

Identity Chapters

Find Your Reunion

Events
Career Resources

Job Search Resources

Career & Life Transitions

Programs & Services

Career Video Library

Alumni Education

Research Resources

Volunteering
Alumni News

Class Notes

Alumni Voices

Books

Giving

Contact Alumni Relations
Directory

Upcoming Events

Groups

Email

My Account
Events
Admission Events & Information Sessions
MBA Program

MSx Program

PhD Program
Alumni Events

All Other Events

Enter the terms you wish to search for.

Faculty & Research

In This Section

Faculty
Publications
Books
Working Papers
Case Studies
Research Labs & Initiatives
Behavioral Lab
Data, Analytics & Research Computing

Faculty
Publications
Books
Working Papers
Case Studies
Research Labs & Initiatives
Behavioral Lab
DARC

Faculty & Research Publications Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

By Omar BesbesYonatan GurAssaf Zeevi

Advances in Neural Information Processing Systems (NIPS)

2014 Vol. 27 Pages 199-207.

Operations, Information & Technology

View Publication

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over some given horizon of play T. To do this, the gambler needs to acquire information about arms (exploration) while simultaneously optimizing immediate rewards (exploitation); the price paid due to this trade off is often referred to as the regret, and the main question is how small can this price be as a function of the horizon length T. This problem has been studied extensively when the reward distributions do not change over time; an assumption that supports a sharp characterization of the regret, yet is often violated in practical settings. In this paper, we focus on a MAB formulation which allows for a broad range of temporal uncertainties in the rewards, while still maintaining mathematical tractability. We fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward “variation” and the minimal achievable regret, and by establishing a connection between the adversarial and the stochastic MAB frameworks.

Footer contact links

Contact Us
Visit Us
Stay In Touch

Footer 1

Companies, Organizations & Recruiters
Stanford Community
Newsroom

Footer 2

Library
Jobs
MyGSB

Footer legal links

Accessibility
Non-Discrimination Policy
Privacy Policy
Terms of Use
Stanford University

The Experience
Business, Government & Society Initiative
The Programs
Executive Education
Faculty & Research
Stanford Seed
Library
Alumni
Insights
Stanford Business Podcasts
Stanford Business Magazine
- All Issues
  - Spring 2022
  - Spring 2021
  - Fall 2021
  - Autumn 2020
  - Summer 2020
  - Winter 2020
  - Shift
  - Catalyst
  - Value
Newsroom
Events
Stanford Community Resources
Jobs
Visit Us
- Dining
- Accommodations
Contact Us
- Stay in Touch
- Follow Us
Companies, Organizations, & Recruiters

About Stanford GSB

About the Experience

Full-Time Degree Programs

Non-Degree & Certificate Programs

Faculty

Faculty Research

Research Hub

Research Labs

Research Initiatives

Topics

Welcome, Alumni

Admission Events & Information Sessions

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards