Robot hands tie a reminder ribbon around a finger. credit: iStock/guoya/erhui1979

Applying machine learning to field data revealed which college students are more likely to respond to reminders to apply for financial aid. | iStock/guoya/erhui1979

Running a classic randomized controlled study is pretty straightforward: You set up a control group and a treatment group, run an experiment on the treatment group, and then compare the results. This has become the standard way of seeing if something is effective — whether it’s a new medicine or a social policy.

“When we think about policies that can improve people’s lives, especially in the past decade there’s been this paradigm that you see what works by running randomized trials,” says Jann Spiess, a professor of operations, information, and technology at Stanford Graduate School of Business and faculty affiliate at the Golub Capital Social Impact Lab.

It makes sense: What better way to determine whether a policy helps people than by observing it in action? However, this approach is resource intensive, Spiess notes. Trials are difficult to organize and run. And despite these costs, the results’ explanatory power may be limited, as they typically focus on average effects across a population, obscuring the impacts on individuals.

Enter artificial intelligence — specifically, using AI to crunch big datasets and generate unexpected predictions or categorizations. Machine learning algorithms have proven especially good at burrowing into data collected in the field and unearthing new details on not only how interventions work, but for whom. “Machine learning gives us the opportunity to essentially personalize treatments,” Spiess says.

In recent work with two GSB colleagues — economics professor Susan Athey, who directs the Golub Capital Social Impact Lab, and former postdoctoral scholar Niall Keleher — Spiess paired a machine learning algorithm with experimental data to study how to get more college students to apply for financial aid. (Spiess and Athey are also fellows at the Stanford Institute for Economic Policy Research.) What they found not only confounded expectations in this particular case, it also confirmed the potential power of applying this hybrid approach to research more broadly.

A Fine-Toothed Code

The researchers partnered with ideas42, a nonprofit that ran field experiments to see if small behavioral “nudges” might encourage students at the City University of New York (CUNY) to apply for federal financial aid. In these experiments, reminders sent by text or email resulted in a 6 percentage point increase in applications in 2017 and a 12 percentage point increase in 2018.

Quote
Machine learning gives us the opportunity to essentially personalize treatments.
Author Name
Jann Spiess

Yet these were average outcomes. To develop a more nuanced portrait of who responded to the nudges, Spiess, Athey, and Keleher trained a machine learning algorithm on the field results. They found that text and email messages were most effective for students who were already somewhat inclined to file for financial aid. Students who were not likely to apply were mostly unmoved by these gentle reminders. This more specific insight could help school administrators and policymakers avoid costly attempts to target people who probably wouldn’t respond.

“Going into this, we may have hoped the nudges work especially well for students who are unlikely to file. And based on the experiment’s generally promising results, we would have prioritized that group,” Spiess says. “Had we done that, we would have been pursuing exactly the wrong people.”

At the same time, machine learning alone would not be sufficiently instructive. As Spiess explains, a model could predict which people are most — or least — likely to apply for financial aid. Yet this would not demonstrate whether texts or emails actually help any of these groups. By combining the algorithm and the experiment, the researchers were able to find the strength of the treatment effect on distinct populations.

Ultimately, the researchers concluded that the most effective policy is to target nudges to the middle of the group — students who are neither the most nor least likely to reapply for financial aid. At either end of this spectrum, the power of the nudge weakens, particularly for those who are the least likely to apply for aid.

Lighting Up Blind Spots

Over the past 15 years, Athey and her lab have been working on this method of synthesizing experimental results and machine learning. What they’ve produced promises to improve both the process of experimentation and its outcomes.

One of the central criticisms of experiments is that they can lack “external validity” — their findings may not apply in a different context. If an experiment is conducted in India, will its results hold in Kenya, or is there something distinct about the setting or the subjects that accounts for the findings? Are policies that target college students in New York City transferable to their peers in California?

Here, too, machine learning can help close the gaps. “There are lots of differences across geographies,” Athey says. “If these are macro differences, something systemic, then this method admittedly can’t help. But if there are micro differences, like a different distribution of income or age, then we can use the algorithm to adapt the treatment effect estimates to a particular population.”

In other words, if the machine learning algorithm is trained on experimental outcomes and finds varying effects across readily measurable differences like gender or, in the case of CUNY students, an inclination to apply for financial aid, then these results can likely be mapped to other places even if the population looks dissimilar. The machine learning model, in essence, can help mitigate concerns of external validity by fitting the results from one population to another distinct population.

This hybrid approach has the potential to make experimentation less expensive by supporting faster iteration. As an experiment is running, machine learning can discern what works and suggest ways to fine-tune interventions in real time for maximum impact.

For policymakers, this adaptable, targeted process provides the ability to move beyond catchall approaches that are often costly and marginally effective. (Though Athey notes that this is a “data hungry” method: Larger samples are necessary for more granular results.)

The method, Spiess says, also illuminates blind spots — such as the people who are left behind by certain interventions. In the case of New York’s college students, the study revealed that simple reminders don’t work for those most at risk of losing their financial aid. This is precisely the group that policymakers might want to target.

And while it’s easy to imagine how this technology can drive digital solutions — better email reminders, for instance — Athey is more excited about using it to improve interactions between people.

“So much work involves humans helping other humans, but it’s really difficult for the helper, the coach, to have memorized all the details and gathered all the knowledge they need to give customized advice or treatments,” she says. The approach demonstrated here could support more personalized attention. “That’s the best of both worlds. If the computer is supporting the coach or the teacher or the helper, then they can have all the information they need to offer the best options.”

For media inquiries, visit the Newsroom.

Explore More