Artificial Intelligence, Cultural Diversity, and a Giant “Bag of Words”
How machine learning helped researchers sort through 500,000 Glassdoor reviews to weigh the benefits of workplace diversity.
Done manually, the analysis of a half-million Glassdoor reviews would have been an impossibly laborious task. | Illustration by Josh Cochran
It begins with a question. In Amir Goldberg’s case, it was an old question, long-debated: How does the diversity of a firm’s culture affect corporate performance?
Cultural diversity could be a great advantage, inspiring creativity through a rich and varied set of ideas. But one can easily imagine this diversity nudging companies the other way, driving between employees a wedge of disagreement that hinders performance.
This two-part feature explores the many ways in which access to superabundant data sets is changing the face of research among Stanford GSB faculty.
In the first article professors discuss how big data and machine learning have transformed traditional methods of scholastic inquiry — and possibly the basic tenets of inquiry itself.
So which is it, Goldberg wondered?
Researchers interested in this question would traditionally list the dimensions of corporate culture they consider to be most important — a company can be competitive or cooperative, formal or informal. They would distill these variables into a survey that they send to thousands of employees at different companies.
In return, they would get a neatly structured table of information: Employee A thinks Google is competitive and innovative, while employee B thinks Walmart is bureaucratic and formal. And so on.
There are two fundamental problems with this method, Goldberg explains. First, researchers impose their own narrow set of cultural types on an otherwise messy system. Second, people aren’t actually very good at answering surveys. They interpret questions in unexpected ways. They give answers that they think the researcher wants to hear. And they sometimes guess because they don’t know what to say.
Goldberg wanted to take a different approach. He contacted Glassdoor, a job and recruitment site on which employees anonymously and publicly review the companies where they work. How, Goldberg wondered, do people talk about the cultures of their companies when unconstrained by a survey? Do employees in the same company agree on the culture of the workplace? If not, in what ways do they disagree?
Goldberg and two coauthors, Matthew Corritore from McGill University and Sameer Srivastava from UC Berkeley’s Haas School of Business, collected roughly 500,000 reviews from 492 publicly traded companies over a seven-year span. From this huge set of disorganized data — a scale and timeframe completely beyond the reach of traditional surveys — Goldberg set out to extract and classify discussions of culture. He sought needles in the haystack.
Throughout this process, Goldberg says, he received support from the specialists at Stanford GSB Data, Analytics, and Research Computing. They helped to ensure that access to the data complied with the requirements set out by Glassdoor. For other projects on which Goldberg has worked, DARC has helped navigate the contractual labyrinth of third-party data collection, construct the internal infrastructure needed for analysis, and structure data into something useable — complicated work made easier with their help.
Done manually, the analysis of a half-million Glassdoor reviews would have been an impossibly laborious task. So Goldberg turned to a machine learning approach known technically as Latent Dirichlet Allocation topic modeling; more informally, it’s the “bag of words” approach.
The only constraint placed on the algorithm was how many topics it should look for. In this case, Goldberg might assume there are 50 relevant cultural types. Or perhaps 100. (This guess can be refined over time to find the number with the most explanatory power.) Once the algorithm knows how many topics to look for, it scans every document and creates what is, in essence, a gigantic spreadsheet defining the probability that one word will appear near another word in a sentence. Without actually understanding what these words refer to, the algorithm is able to classify different clusters of words into one or another cultural bucket.
“Rather than imposing possibilities top-down, the algorithm inductively infers categories without human input,” Goldberg says. “No human understanding is happening; most basically, this is a statistical model that looks for words that tend to co-occur.”
Goldberg and his collaborators first trained the algorithm on roughly one million sentences that contained the word “culture,” or a close synonym (environment, atmosphere, attitude, climate, value, philosophy, belief). What words occurred around these words? This training provided a reliable model of different cultural categories, and, from this, the researchers applied the model to every sentence in every review, that way pinpointing discussions of a company’s culture whether explicit or not.
This analysis, again, included nearly 500 publicly traded companies, which allowed Goldberg to then look at two central data points. First, what do people within a given company say about its culture? Do they agree on the culture of their workplace? Do they think that it embraces a broad array of cultures? Second, what is that company’s return on assets — a proxy for its effectiveness in the market?
“We were not interested, substantively, in the culture of any particular firm,” Goldberg says. “We were interested in asking simply whether employees agree or disagree on what a firm’s culture is.” Goldberg found that companies in which there is disagreement about culture are less efficient, while companies that embrace a diverse culture are more innovative.
In the end, Goldberg says, it is easy to be captivated by the shininess of “machine learning,” by the aura of omniscience this phrase carries in contemporary culture. “These tools — they’re fancy, fun, and cool,” Goldberg admits. But a fundamental question must inform the academic application of machine learning. “Why do you need to use it?”
Beyond the trendiness, he notes, researchers ought to have a clear and compelling case for exploring gigantic datasets with sophisticated and often opaque algorithms. The work may be a magnet for funding and attention, but is it necessary?
“In our case, using a linguistic model on Glassdoor reviews helped us measure this nebulous thing called ‘culture’ in a way we otherwise could not have,” Goldberg says. “What we learned from this work we simply could not have learned otherwise.”
For media inquiries, visit the Newsroom.