Wenjia Ba is a job market candidate from Stanford Graduate School of Business, co-advised by Prof. Haim Mendelson and Prof. Michael Harrison. Her research focuses on understanding sequential interactions in online environments using tools from machine learning, revenue management, and probability theory, with applications in online advertising, games, and emerging platforms like virtual assistants. Previously, she received the B.S. degree in Mathematics and Applied Mathematics from Fudan University in 2016.
- Sequential learning with high-dimensional data
- Digital platforms
- Game theoretical learning
- Revenue management
(with J. Michael Harrison and Harikesh S. Nair, manuscript available upon request) We consider models of sequential decision-making by an online advertiser. In a sequence of trials, the advertiser first chooses the audience segment for which an impression will be purchased, decides the ad that will be shown to that user, and finally observes a "click or no-click" binary outcome. The problem of finding the best audience-ad combination is complicated by (i) low click-through rates (the average value for display advertising is below 0.5%), and (ii) high dimensionality, meaning that there are many possible combinations of observable user characteristics and available ad choices. Adopting the now standard conceptual framework of a multi-armed bandit model, we propose and evaluate a novel algorithm (PMDL, Poisson Model with Debiased Lasso) that addresses these challenges. In numerical comparisons on synthetic test problems, our proposed method is comparable to that of leading alternatives in low-dimensional settings, and it continues to perform well in high-dimensional settings where alternative methods are computationally infeasible.
(with Haim Mendelson and Mingxi Zhu, manuscript available upon request) We study the implications of selling through a voice-based virtual assistant (VA). The seller has a set of products available and the VA decides on product ranking and pricing, seeking to maximize seller profit, consumer surplus or total surplus. The consumer is impatient and rational, seeking to maximize her expected utility. The VA presents the products sequentially. Once a product is presented and priced, the consumer evaluates it and decides whether to purchase. The consumer's valuation comprises a pre-evaluation value which is common knowledge and a private post-evaluation component. We solve for the equilibria and develop efficient algorithms for implementing the solution. We examine the effects of information asymmetry on the outcomes and study how incentive misalignment depends on the private valuation distributions.
(with Tianyi Lin, Zhengyuan Zhou, and Jiawei Zhang, manuscript available upon request) We consider online no-regret learning in unknown games with bandit feedback, where each agent only observes its reward at each time -- determined by all players' current joint action -- rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. This is a generic problem that covers a wide range of economic models, including the Cournot competition where the price is jointly determined by all the firms, and the Kelly auction where the outcome is jointly determined by the bids from all participants. Our work identifies the first doubly optimal bandit learning algorithm for a class of smooth and strongly monotone game in that it achieves both optimal regrets (up to log factors) in single-agent learning and optimal last-iterate convergence rate for multi-agent learning.
(with Lin Fan, J. Michael Harrison and Peter Glynn)