Technology & AI

Language Lessons: How LLMs Are Transforming Research

Stanford GSB faculty are using AI to analyze text at unprecedented speed and scale.

May 11, 2026 10 Min Read
It has become routine for researchers to interpret documents containing billions of words.| Saiman Chow

When President Donald Trump announced a wave of tariffs in early 2025, he launched a war of words. Threats and counterthreats flew across the Atlantic and Pacific in a swirl of escalation, retaliation, and reprieve. Economists were hard-pressed to get a clear picture of what was happening beyond the headlines. Solid numbers would not be available for weeks or months, and when they arrived, they would capture only part of the story.

“As economists, we would like to track how all this geoeconomic pressure gets applied,” says Antonio Coppola, an assistant professor of finance at Stanford Graduate School of Business and a faculty fellow at the Stanford Institute for Economic Policy Research (SIEPR). “But a lot of it is not going to show up in the standard tabular data sets.”

Quote
It’s really exciting that AI tools designed to analyze language can be easily adapted to solve very different problems across the sciences.
Author Name
Susan Athey

Specifically, Coppola and his colleagues at the Global Capital Allocation Project, based at Stanford GSB, wanted to know how businesses in the U.S. and around the world were responding to the shakeup. “Are they changing their prices? Are they adjusting their supply chains? Are they undertaking new R&D to develop products that fall short of export control criteria?”

The answers were out there. Yet they were embedded in a sprawling, messy source: tens of millions of words written and spoken by executives and analysts.

Teaming up with Matteo Maggiori, a professor of finance at Stanford GSB and senior fellow at SIEPR, and two colleagues at Yale and Columbia, Coppola assembled a tranche of earnings call transcripts and analyst reports — more than 780,000 documents from more than 21,000 companies over more than a decade. They then fed this text into large language models (LLMs) running on Stanford’s high-performance computing clusters.

The LLMs cut through the fog of the trade war to find rich evidence of spiking concerns and shifts in strategy in the weeks following Trump’s “Liberation Day” announcement in April 2025. More than 30% of all earnings calls mentioned negative impacts from tariffs and more than 60% of calls from U.S. firms reported negative effects.

In the not-so-distant past, research like this would have involved sending out surveys, searching for keywords, or recruiting an army of grad students. The work would be laborious and expensive, and the results might be superficial and outdated. By tapping into the power of pretrained language models, Coppola, Maggiori, and their colleagues were able to generate findings that were expansive, granular — and fast.

“We can extract this detailed information at a high frequency — and in almost real time,” Coppola says. (The latest data is available in an online dashboard.) Their study was further evidence that when it comes to scale, speed, and sophistication, AI-driven language analysis is leaving traditional research methods in the dust.

The Search for Meaning

From its earliest days, the development of artificial intelligence has been inseparable from the goal of getting computers to understand the complexities of human language. What most people now know as AI —chatbots that display dazzling linguistic fluency and can be instructed without code — is the most dramatic milestone in this quest.

Quote
They just allow you to comb through a scale of content that no human coder could conceivably do.
Author Name
Douglas Guilbeault

“When ChatGPT was released in November 2022 and exploded into the public consciousness, that was the culmination of a long process of developments in computational methods that can be traced back to the ’90s,” says Amir Goldberg, a professor of organizational behavior at Stanford GSB. A key breakthrough came in 2017 with the introduction of the transformer, a neural network architecture designed to learn linguistic patterns from massive quantities of text. Combined with advances in machine learning and an explosion of computing power, this made today’s generative AI systems possible and opened opportunities for research with unprecedented scope.

Over the past decade, it has become routine for researchers to use these quickly evolving tools to interpret bodies of text, or corpora, that may contain millions of documents or billions of words. “They just allow you to comb through a scale of content that no human coder could conceivably do,” says Douglas Guilbeault, an assistant professor of organizational behavior at Stanford GSB. And their abilities go beyond digesting text. LLMs in particular “have changed the game in terms of our ability to incorporate a wide range of information into statistical models,” says Susan Athey, PhD ’95, a professor of economics at Stanford GSB and a senior fellow at SIEPR.

Beyond digesting text at scale, language models can aggregate what might seem like a jumble of disconnected ideas into something coherent and quantifiable. This has made them a powerful tool for making sense of the unstructured data that, by some estimates, accounts for at least four-fifths of all new information being produced.

Goldberg, a sociologist and co-director of Stanford’s Computational Culture Lab, sees the “digital traces” of modern life — emails, chats, social media posts — as an informational treasure trove. “Culture is about the social processes by which meanings are negotiated and agreed upon,” he says. “The most prominent medium through which this negotiation happens is through linguistic exchange. Suddenly, this linguistic exchange became data that we could use to measure things that had been out of reach.”

In a recent study, Goldberg analyzed 14.7 million emails to measure the overlap between employees’ self-identity and their collective workplace identity. As more records are digitized, LLMs have also opened a portal into the past. When studying the origins of unconventional ideas that make their way into the mainstream, Goldberg examined 4.9 million congressional speeches and 4.2 million court decisions.

“Suddenly, we have an instrument that allows us to go back in time,” Goldberg says. “I can’t go back and ask people in the ’60s what they think. But the people in the ’60s left a lot of documents behind that I can analyze.”

More Than Words

While the architecture of LLMs is loosely inspired by the wiring of the human brain, their approach to language is fundamentally different. “From the perspective of the neural network, it’s just a sequence of numbers,” Goldberg says. “That these numbers represent text is immaterial.” Despite this purely mathematical approach to language, the models are remarkably sensitive to context clues and semantic subtleties. For example, in Coppola and Maggiori’s study, the LLM could tell when executives or analysts were discussing the effects of geoeconomic pressure even if they didn’t use the words “tariffs,” “sanctions,” or “export controls.”

Quote
Suddenly, we have an instrument that allows us to go back in time.
Author Name
Amir Goldberg

This ability to read between the lines has enabled researchers to glean more meaning from data, explains Ada Aka, an assistant professor of marketing at Stanford GSB. In a traditional study, a volunteer might be asked to rate a product on a numeric scale, generating a single data point. Yet if they are asked for verbal feedback, their response can be processed by an LLM, which may discern not only stated preferences but the emotions and reasons behind them.

“In just the same amount of time, maybe a few seconds more, what you get is quite rich contextual person-specific information,” Aka says. “With this context, we can almost quantify someone’s thoughts. Language data is so rich for learning everything from what people value to their expectations, their concerns, and their uncertainties.”

Language models can also pick up on connections and patterns that are not obvious to people. In a study on the memorability of real brand slogans, Aka used OpenAI’s GPT model to characterize the semantic relationships of 929 slogans across more than 1,500 dimensions. Without using human data, the model demonstrated how marketers might use similar predictive tools to select catchier slogans.

Another feature of LLMs is their consistent and detached approach to analysis. “It’s not simply that no human would have the bandwidth to go through it all,” Guilbeault says. “There’s also an intrinsic cognitive limitation” — people bring their own individual sense of meaning into the equation, which affects how they interpret what they read. While language models are far from unbiased — Guilbeault has documented the gender and age stereotypes they pick up from their training material — he notes that they “can provide a bit more of an agnostic analysis.”

LLMs have also proved surprisingly useful for tackling problems that seem unrelated to language. In an upcoming Stanford GSB Quick Study video, Athey, the founding director of the Golub Capital Social Impact Lab, describes a recent study in which she deployed an LLM to answer a question that has long challenged economists: predicting people’s career paths. “My team and I wondered…. Could the same architecture used to create breakthroughs like ChatGPT be adapted to predict your future occupation?”

With minimal coding, Athey and her colleagues used tens of thousands of resumes to fine-tune a publicly available LLM. By analyzing job titles and patterns in work histories, this approach (dubbed LABOR-LLM, or Language-Based Occupational Representations with Large Language Models) predicted job transitions more accurately than specialized models trained on much larger datasets. “To an LLM,” Athey explains, “a sequence of jobs is a sequence of words.”

Just as LLMs have introduced the public to the potential of artificial intelligence, they have opened new possibilities for research that would have been unimaginable just a few years ago. “It’s really exciting that AI tools designed to analyze language can be easily adapted to solve very different problems across the sciences, including those that people had never thought of as text problems,” Athey says. “And this is just the beginning.”

Word Problems

More examples of large-scale language analysis in Stanford GSB research

Michele Gelfand, a professor of organizational behavior, was part of a team that developed a “threat dictionary” by examining 120 years of digitized newspapers. This algorithmic tool tracked how collective responses to threats can predict political and economic shifts.

Michael Hannan, a professor of organizational behavior, emeritus, trained the BERT transformer model on 680,000 book descriptions from Goodreads. Its genre classifications closely correlated with human picks.

Ashley Martin, an associate professor of organizational behavior, used machine learning to analyze more than 43,000 corporate documents containing 1.2 billion words. She and her coauthors found that firms that appointed female CEOs shifted their language to associate women with leadership trait

Utilizing text-similarity tools, Robert Bartlett, a professor of finance (by courtesy), studied 4,758 VC term sheets, finding that startup contracts have become remarkably standardized over the past two decades.

Robb Willer, a professor of organizational behavior (by courtesy), and his colleagues estimated the political ideologies of 13 million Twitter users by examining 3.5 billion tweets.

By cross-referencing 1.2 million Glassdoor reviews and transcripts from earnings calls, Charles O’Reilly, the professor of organizational behavior, determined the personalities of 460 chief executives. He found that CEO personality is the cornerstone of organizational culture.

Mary Barth, a professor of accounting, emerita, deployed machine-based textual analysis to evaluate the quality of integrated reports from 125 firms.

Gregory Martin, an associate professor of political economy, and Shoshana Vasserman, an associate professor of economics, trained an algorithm to identify investigative reporting in 5.9 million newspaper articles, finding that it declined sharply following a wave of newsroom layoffs.

Explore More