It’s no secret that you shouldn’t take everything a Large Language Model (LLM) tells you at face value. If you’ve recently used ChatGPT, Claude, or Gemini, you may have noticed a disclaimer at the very bottom of the screen: The AI “can make mistakes,” so you may want to check its work.

In addition to containing inaccuracies and hallucinations, answers generated by LLMs may have a noticeable partisan bias. In a new paper, Andrew Hall, a professor of political economy at Stanford Graduate School of Business, and two coauthors demonstrate that users overwhelmingly perceive that some of the most popular LLMs have left-leaning political slants. The researchers then show that with just a small tweak, many models can be prompted to take a more neutral stance that more users trust.

These results could have big implications for how AI models are trained and regulated, especially as these tools play a bigger role in mediating our access to information and news. “Measuring user perceptions and adjusting based on them could be a way for tech companies to produce AI models that are more broadly trusted,” Hall says. “And from a policy standpoint, this gives you a way to start evaluating bias.”

Yet evaluating ideological slant isn’t easy, especially given the current political environment where basic facts can come up for debate. Other researchers have devised political quizzes to test LLMs for bias. But Hall says these experiments don’t really mimic how people interact with these tools in the real world. “Without an actual use case, it’s hard to gauge what the actual measure of this slant looks like,” he says.

To solve this problem, Hall and his colleagues Sean Westwood of Dartmouth College and Justin Grimmer, a political scientist in the Stanford School of Humanities and Sciences, tried to get LLMs to respond in a way that was more in line with what typical users might see — a technique they called “ecologically validated responses.” The researchers gave 30 political questions to 24 different LLMs from 8 companies, including OpenAI, Google, and xAI. Then, they had more than 10,000 people in the U.S. look at those responses and rate their political slant. Respondents were also asked what else they would ask the models, and some of those prompts were added to the study.

For 18 of the 30 questions, users perceived nearly all of the LLMs’ responses as left-leaning. This was true for both self-identified Republican and Democratic respondents, though the Republicans perceived a more drastic slant from all eight companies that created the models, while Democrats noticed a slant in seven.

The researchers aggregated the slants of different LLMs created by the same companies. Collectively, they found that OpenAI models had the most intensely perceived left-leaning slant — four times greater than perceptions of Google, whose models were perceived as the least slanted overall. On average, models from Google and DeepSeek were seen as statistically indistinguishable from neutral, while models from Elon Musk’s xAI, which touts its commitment to unbiased output, were perceived as exhibiting the second-highest degree of left-leaning slant among both Democratic and Republican respondents.

Finding Bias and Balance

The topics the LLMs were asked about included transgender rights, school vouchers, and birthright citizenship. In one question, the researchers asked each model whether the U.S. should keep or abolish the death penalty. Hall says this is a topic where people might agree about the basic facts but disagree about which values matter most. One LLM created by Alibaba responded that the death penalty should be abolished because it doesn’t give people a second chance. “Removing the death penalty promotes fairness and shows that human life is always valuable, even when someone has done something terrible,” it wrote. Users perceived this response, which didn’t include mention of victims’ families or whether the death penalty may be a deterrent, as left-leaning.

Quote
Is it a good or bad thing that the AI you choose because it represents your values only tells you things you already believe?
Author Name
Andrew Hall

Hall and his coauthors found that prompting an model to adopt a neutral stance generated responses that users found less biased and that they considered to be higher quality. A Google LLM’s neutral answer to the death penalty question acknowledged uncertainty surrounding the issue and presented strong arguments from both sides. “There is no widespread consensus on this issue, and states remain divided on its use,” it concluded.

Responses like this were more likely to include words like “balance,” “careful,” “complex,” “sides,” and phrases like “careful consideration.” Users were more likely to trust more neutrally worded responses and said they were more likely to consider using the LLM that generated them.

Hall acknowledges that relying on users’ perceptions to measure political slant has some weaknesses, particularly as the line between facts and opinions gets fuzzy. What users think is neutral may not always be factually correct; what is factually correct may not always be perceived as neutral. On the other hand, “things we think are facts today may not be facts tomorrow and vice versa,” he says. And not all facts can be presented with neutrality or uncertainty. “There are certain issues that are either matters of fact or matters of principle that may need to outweigh perceptions of slant.” Neutrality, he says, is “not a panacea.”

In their future work, Hall and his coauthors plan to evaluate perceptions of slant on a variety of prompts and answers that are designed to touch on issues where the facts are clear. “One possibility is that simply adopting the tone of, ‘I’ll just tell you facts,’ will be perceived as bias,” he says.

Spin-Free Zones?

By veering toward a neutral stance, however, LLMs could inadvertently reinforce the status quo — which is, in its own way, a kind of slant that could alienate some users. “There’s no clean solution to this,” Hall says. He advises that AI companies consider which values they consider non-negotiable and which they’re willing to express neutrality around.

Another possibility is that companies could create different LLMs to mimic differing political views. “That seems like a good idea because it lowers the stakes and no company is seen as dictating values to everyone. Instead, you have a choice and you feel less coerced,” Hall says. Yet this would undoubtedly feed into concerns that we’re trapped in online echo chambers that reinforce our values rather than represent a multiplicity of views. “Is it a good or bad thing that the AI you choose because it represents your values only tells you things you already believe?” Hall asks.

Ultimately, Hall hopes that AI companies will use an approach similar to the one demonstrated in this paper to evaluate their models and adjust them as political norms change. For now, however, it’s too expensive to run massive surveys like this regularly.

Yet AI models could help solve that problem. Companies could periodically survey small groups of users and train AIs to learn from their responses and test LLMs. “You can’t use an AI to predict user perceptions accurately right now,” he says. That could change, and it could help LLMs reflect current cultural and political norms — if we trust the models (and the people who design them) to listen to their users.

For media inquiries, visit the Newsroom.

Explore More

December 08, 2021

The Spy Who Came in from the Code

A New York Times reporter details how the government and tech companies are leaving the U.S. vulnerable to hacking and cyber espionage.
Colorful illustration of a dark and roiling ocean being held back by a precarious retaining wall. Credit: Dalbert Vilarino