Browse or search publications from faculty affiliated with the lab.
Using Wasserstein Generative Adversarial Networks for the Design of Monte Carlo Simulations
Researchers often use artificial data to assess the performance of new econometric methods. In many cases the data generating processes used in these Monte Carlo studies do not resemble real data sets and instead reflect many arbitrary decisions…
Sufficient Representations for Categorical Variables
Many learning algorithms require categorical data to be transformed into real vectors before it can be used as input. Often, categorical variables are encoded as one-hot (or dummy) vectors. However, this mode of representation can be…
Balanced Linear Contextual Bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation…
Synthetic Difference in Differences
We present a new perspective on the Synthetic Control (SC) method as a weighted least squares regression estimator with time fixed effects and unit weights. This perspective suggests a generalization with two way (both unit and time) fixed…
Generalized Random Forests
We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman [Mach. Learn. 45(2001) 5–32]) that can be used to fit any quantity of interest identified as…
Estimation Considerations in Contextual Bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult…
Offline Multi-Action Policy Learning: Generalization and Optimization
In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as…
Economists (and Economics) in Tech Companies
As technology platforms have created new markets and new ways of acquiring information, economists have come to play an increasingly central role in tech companies – tackling problems such as platform design, strategy, pricing, and policy. Over…
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
Many scientific and engineering challenges—ranging from personalized medicine to customized marketing recommendations—require an understanding of treatment effect heterogeneity. In this article, we develop a nonparametric causal forest …
Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data
We estimate a model of consumer choices over restaurants using data from several thousand anonymous mobile phone users. Restaurants have latent characteristics (whose distribution may depend on restaurant observables) that affect consumers’ mean…
Approximate Residual Balancing: Debiased Inference of Average Treatment Effects in High Dimensions
There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on…
Stable Predictions across Unknown Environments
In many important machine learning applications, the training distribution used to learn a probabilistic classifier differs from the testing distribution on which the classifier will be used to make predictions. Traditional methods correct the…
Exact P-values for Network Interference
We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that…
Sampling-Based vs. Design-Based Uncertainty in Regression Analysis
Previously titled: Finite Population Causal Standard Errors
Consider a researcher estimating the parameters of a regression function based on data for all 50 states in the United States or on data for all visits to a website. What…
Estimating Average Treatment Effects: Supplementary Analyses and Remaining Challenges
There is a large literature on semiparametric estimation of average treatment effects under unconfounded treatment assignment in settings with a fixed number of covariates. More recently attention has focused on settings with a large number of…
Beyond Prediction: Using Big Data for Policy Problems
Machine-learning prediction methods have been extremely productive in applications ranging from medicine to allocating fire and health inspectors in cities. However, there are a number of gaps between making a prediction and making a decision,…
Context Selection for Embedding Models
Word embeddings are an effective tool to analyze language. They have been recently extended to model other types of data beyond text, such as items in recommendation systems. Embedding models consider the probability of a target observation (a…
Matrix Completion Methods for Causal Panel Data Models
In this paper we develop new methods for estimating causal effects in settings with panel data, where a subset of units are exposed to a treatment during a subset of periods, and the goal is estimating counterfactual (untreated) outcomes for the…
Structured Embedding Models for Grouped Data
Word embeddings are a powerful approach for analyzing language, and exponential family embeddings (EFE) extend them to other types of data. Here we develop structured exponential family embeddings (SEFE), a method for discovering embeddings that…