The Research Revolution

November 20, 2020

| by Dylan Walsh

“We see life cursed by drink, brutality and vice, and loaded down with ignorance and poverty, while industry is choked by its own blind struggles, and education is still painfully mounting, and too often slipping back from, the first rungs of its ladder.”

So wrote the British shipping magnate Charles Booth in 1903 as he grimly concluded Life and Labour of the People in London, his landmark study of living conditions in what was then the world’s largest city.

The publication is considered one of the first significant surveys of social issues, and both its aims and methods went on to influence government and academic researchers throughout the 20th century. Booth collected data for the report with in-person interviews that he often conducted personally, at night and on weekends, when time permitted.

The project took him 15 years to complete.

For those who came after Booth and made it their business to understand the cultural and economic details of how people live their lives, the challenge of data collection persisted. During the Great Depression and New Deal, for instance, the U.S. government sent interviewers flocking across the country to collect information on the economy through face-to-face meetings with denizens of city, town, and country. This was no minor undertaking. World War II witnessed a similar time- and resource-intensive process, though surveys were sometimes conducted by mail.

Compared to all of that, researchers today have it easy. When Paul Oyer, a labor economist and senior associate dean at Stanford Graduate School of Business, wanted to study how family history influences the trajectory of entrepreneurs, he and a colleague simply downloaded data on every Norwegian citizen of working age — a population comparable to London in 1900 — sorted it to fit their question, then dug into the analysis.

“We looked at what jobs they had, what jobs their fathers had, how much money they earned, and so on,” Oyer says. “The fact that somebody in Norway gathered this information so that we can use it is a great advance in social science research.”

A Fine-Grain, Superabundant Record

Great advance, indeed. The superabundance of data has become a defining feature of our time. There are now more bytes of digital information floating about than there are stars in the observable universe. The expansive digital footprint of our lives provides a fine-grain record of our actions, both trifling and significant. And all of this information can be interpreted with machine learning algorithms that, if designed properly, discover patterns and relationships in mountains of unstructured data — a task wholly unmanageable by humans alone.

“These tools have opened up an entirely new set of questions that we couldn’t ask before,” Oyer says. “People have always been curious about certain things, but in the old days we’d throw up our hands and say, ‘There’s no way to know.’ Now we’re in position to know.”

And beyond the ability to answer new questions through straightforward data crunching at spectacular scale, a more profound change may be underway, transforming not just old methods of inquiry, but possibly the fundamentals of inquiry itself.

“In some ways, machine learning approaches and the availability of data allow us to rethink how science is done,” says Amir Goldberg, an associate professor of organizational behavior at Stanford GSB who uses huge datasets to explore institutional culture.

For centuries, the standard approach to science has been to develop a hypothesis, run a test on a specific sample, and then look at the results. If you find what you hypothesized, then a bit of statistical analysis can confirm the solidity of those results in the face of random chance.

In his 2015 paper “In Defense of Forensic Social Science,” Goldberg notes how easily this process can go awry, likening it to detectives showing up at a crime scene with a suspect in mind, then hunting only for evidence that will confirm their hunch. In several disciplines, a cavalier approach to the scientific method, coupled with sophisticated use of statistics, has led to a crisis of replication, in which substantial numbers of past results are not holding up under scrutiny — the equivalent of detectives getting caught pinning crimes on innocent people.

On the administrative side, we have tried to build an engine that will make Stanford the best place to do this kind of data-oriented research.

— Jonathan Levin

Machine learning, Goldberg says, instead allows researchers to take an investigative approach more akin to proper forensics: Examine all of the evidence available, then weigh the likelihood of different hypotheses to find the most probable one.

“Rather than expecting something and testing for that hypothesis, you basically analyze all the data, generate millions of hypotheses, and then figure out which are most consistent with the data,” Goldberg says. “This isn’t a fail-proof method, and there are plenty of flaws and challenges to consider, but if taken to the extreme, it can fundamentally change how we do science.”

Getting Closer to Reality

In the past, much research — particularly behavioral research — rested on a necessary foundation of artifice. “It’s extremely difficult to record behavior in the world, so we set up carefully controlled experiments in the lab with a small number of people and we recreate the conditions that we want to study,” says Michal Kosinski, an associate professor of organizational behavior at Stanford GSB.

But the process is expensive and lacks what academics refer to as “ecological validity,” or the finely shaded nuance of reality. Practical and ethical considerations also preclude the study of many important phenomena, like depression and extremism, that can’t be induced in study participants.

“But now we all carry around these little devices that essentially record our lives 24/7,” Kosinski says. Smartphones and computers have opened a window through which researchers can more unobtrusively observe and test previously obscured aspects of human psychology and behavior.

Nor do these technologies provide trivial insight. A 2015 study by Kosinski and two coauthors used nothing more than Facebook likes to analyze individual personality traits like introversion, conscientiousness, and neuroticism. Kosinski found that by analyzing only 10 “likes,” a computer could pin somebody’s personality more accurately than their coworkers could. With 70 “likes,” the computer could do better than close friends; and with 150 “likes,” it could compete with a spouse.

This vast new ocean of information is complemented by the ability of machine learning to interpret novel types of data, like text, audio, and images — a capability that provides academic studies a more fully realized picture of the world.

Consider recent work by Goldberg. With the permission of job-recruitment site Glassdoor, he had a computer “read” half-a-million employee reviews of the companies where they work. The algorithm was able to use topic modeling, in which it identifies the meaning of words based on the constellation of surrounding words, to infer the cultural characteristics of different companies.

Does Company A value cooperation and Company B competitiveness? What about short-term versus long-term goals? A particular kind of person? A diversity of people?

The algorithm combed through a huge volume of unstructured text and, on its own, pulled these subtleties from every individual review. (Goldberg found that valuing diversity can undermine a firm’s efficiency but increase its innovation.)

an illustration of two data scientists collecting data. Credit: Josh Cochran

Illustration by Josh Cochran

In pre-machine learning days, Goldberg says, capturing 500,000 reviews with fidelity would have been impossible (see the related article). Two alternatives existed. Either a much smaller sample would have been selected, and humans (i.e., undergrad and grad students) would have manually matched each review against a preconceived set of cultural types, or the entire set of reviews would have been analyzed using a “very, very, very coarse keyword analysis, where every mention of a word refers to only one thing.”

This second possibility has obvious pitfalls, given that the meaning of a word often varies by context: “Shot” means one thing when talking about medicine, another when talking about basketball, another when talking about crime, and another when talking about bars. Such depth of understanding is washed out when simply using keywords.

In the case of Goldberg’s study, for instance, employees’ review of their companies may include the word “diversity,” but it’s unclear from basic keyword analysis if they’re talking about whether the company loves it or detests it.

“These machine learning algorithms are inferior to the way in which humans understand language,” Goldberg says, “but they are far more scalable and far superior to the brute-force reduction of text based on this or that keyword.”

The Robot as Researcher

The benefits of using machine learning on big sets of data wasn’t — and still isn’t — always obvious. For six years, Susan Athey, a professor at Stanford GSB, served as Microsoft’s consulting chief economist, where she worked alongside search engineers and waded into the world of machine learning.

She returned to campus advocating for the power of the tool, but her colleagues in the social sciences were initially dismissive, citing the technology’s inability to answer the kinds of causal “what-if” questions that interested them: What would happen if there was more innovation? What about if we increased the minimum wage, or if New York City instituted congestion pricing on commuters, or if corporate tax rates were raised?

Machine learning is instead good at what might be called backward-looking prediction. Athey uses a hotel to demonstrate the point: Suppose you want an algorithm that predicts hotel occupancy based on room price. If you train it using historical data, a basic trend will emerge: that high prices correlate to high occupancy. That’s a successful predictive model, but it would make an abysmal causal model. If you asked, “What happens if I raise prices?” the algorithm might posit, incorrectly, that the best way to improve occupancy rates is to raise prices.

This is why Athey’s colleagues didn’t immediately jump on board. “In the social sciences over the last 20 years, our empirical research has been 80% to 90% about cause and effect and only 10% to 20% about prediction and description,” Athey says. “So off-the-shelf machine learning didn’t look immediately applicable.”

Predictive models, though, are often an integral part of answering what-if questions, and so Athey has made it her mission to carefully merge the benefits of machine learning with the core practices of causal social science research.

Say, for example, she wanted to study what would happen to consumer demand for a particular product or product category if prices changed. Answering this question requires first understanding baseline consumer demand over time: How do variables other than price affect it? Does it fluctuate by season? By day of the week? By type of weather? Is there a relationship between demand and more unusual variables like political climate or gas prices?

In the past, Athey would have thought about which variables mattered most and then designed a model to control for them — a necessarily limited endeavor given the number of variables and complexity of relationships a human can reasonably consider.

Machine learning algorithms are a different story.

Over the past several years, Athey and her like-minded colleagues have been retrofitting one after another econometric model with machine learning algorithms. The result has been universal improvement in their predictive abilities.

“I like to think of it as a robotic research analyst that can test billions of functional relationships across thousands of variables and find the one that works best,” she says. “We haven’t yet figured out how to do anything conceptually deep with machine learning, but we’ve found ways to make a robotic assistant that works really well.”

The Black Box Dilemma

One central, well-recognized concern with this robotic assistance is the fact that machine learning algorithms operate as black boxes. Too often, researchers cannot ultimately explain how an algorithm arrives at its results. Predictive models that are developed by hand remain legible to humans, the gears plainly visible. Not so when computers assume the task.

For Athey, this presents an important academic challenge. “Say I want to run one of these algorithms but my goal is ultimately discovery,” Athey asks. “What if I don’t just want the best answer, but I want to gain understanding as well?”

Such efforts are stymied when machine learning can’t show its work — a concern that echoes the demands of middle school math teachers. Part of Athey’s agenda is to remedy this problem by making the results of machine learning comprehensible not simply as solutions, but as processes that lead to solutions.

This work holds deep practical implications. Mohsen Bayati, an associate professor of operations, information, and technology at Stanford GSB who specializes in health care, has seen algorithms make incredibly precise recommendations in subjects ranging from disease diagnosis to hospital staffing levels.

In many such applications, machine learning regularly outperforms even experienced medical practitioners by a wide margin. But in the health care industry, where decisions can have immediate life-or-death consequences, people are hesitant to take a course of action without knowing why it was recommended.

“This is a new and interesting challenge for behavioral researchers,” Bayati says. “We need to find a way to present the complexities of this black box recommendation to a hospital manager or clinical staff so she can make the most informed decision.”

Incubating the “New Big Thing”

These new tools are reshaping not only the kinds of questions asked and the methods used to pursue answers, but the basic infrastructure of research at institutions like Stanford GSB.

Faculty members, for instance, are reimagining the ways in which they serve as mentors and advisors and are drifting away from the traditional model of working closely with only a few students at a time. Acquiring, organizing, cleaning, and then plumbing the depths of huge datasets demands a different approach, comparable to the hard-science model of labs managed by a principal investigator.

The Golub Capital Social Impact Lab, for instance, guided by Athey to help social sector organizations find efficiencies through machine learning, leans on a breadth of expertise to achieve its mission. It hosts eight postdocs, 24 doctoral students, and two dozen master’s students. Faculty from marketing, finance, economics, engineering, computer science, education, and sociology are all affiliated with the group.

“Many social scientists are moving in this direction, building lab-like structures with a team of people working under their guidance,” says Jonathan Levin, dean of Stanford GSB, who during his tenure has prioritized the advancement of this kind of research. “And because these groups need more resources to acquire data and to work with companies and government agencies, we on the administrative side have tried to build an engine that will make Stanford the best place to do this kind of data-oriented research.”

Like a root system for a tree, Levin and the Stanford GSB staff have established background infrastructure that helps big data research flourish. This includes a behavioral lab for on-campus experimentation, an expansive library research hub, and access to the Data, Analytics, and Research Computing team, which provides an array of support services, from code optimization to timeslots on cloud-based supercomputers.

Greater flexibility is also being built into research funds, allowing faculty to work with PhD students and postdocs from departments across the university. And a data-acquisition group helps students and professors negotiate licenses and contracts with data providers, the private sector, and government agencies — a novel component of big data research.

“There are important issues that must be acknowledged — ensuring, for example, the integrity and credibility of research when you’re collaborating with companies that use proprietary data,” Levin says. “We’re still working through what the framework should be, when we partner with external organizations, for making sure research is replicable, that confidentiality is well protected, and that academic freedom is preserved.”

Despite such challenges, the prevailing mood is one of excitement given how infrequently revolutionary new research tools disrupt academia. Levin pointed to the adoption of formal modeling by the social sciences after World War II. Forty years later came the development of methods for causal inference. And now, he says, we’re witnessing the dawn of big data and machine learning in the social sciences.

“Every once in a while a new big thing comes along,” Levin says, “and this is the new big thing.”