The DARC Arts: How the Wizardry of this Computer Group Helps Enable Research
When faculty need a hand unlocking data, this is who they call.
August 25, 2022
DARC’s Mason Jiang, left, and Alex Storer deal with enormous, complex datasets. | Photo by Elena Zhukova
Mason Jiang knew what he was being asked to do was daunting, and maybe impossible. Political economy professor Greg Martin wanted to harvest all the political ads published on Facebook during the 2018 election season for a study comparing online and offline advertising.
For Jiang, a research analytics scientist in the Data, Analytics, and Research Computing group, this was a dataset unlike any he had encountered before. Although Facebook had given Martin permission to download more than 600,000 ads, it would take weeks of work to make the data accessible. Jiang needed to identify and separate each ad, but also the text, images, audio, and video contained in each one, an ocean of data so enormous it took DARC’s computers six weeks running 24/7 to process it all.
Five months after he started, Jiang presented Martin with the Facebook data, ready to use. In February 2021, Martin, along with five co-authors, published “Political Advertising Online and Offline” in American Political Science Review. “Mason’s assistance on the project was a huge productivity enhancement,” Martin says. “It would have taken us much longer to complete the project on our own.”
Was it the most complicated project Jiang had ever done? “It was up there,” he says. Yet some version of this is what the seven-member DARC team does every day. Its job, in a nutshell, is to identify technical hurdles in faculty research projects and find ways to overcome them.
DARC is part of an evolving research support structure that also includes the business library under the umbrella of the Research Hub, whose technical experts and data specialists form a cross-functional unit to assist faculty. “That’s kind of the secret sauce because everybody’s solving the problems together,” says Julie Williamsen, executive director and assistant dean of the Research Hub.
DARC director Alex Storer says the work professors bring to his team usually comes in two forms. “One is, I have this spectacular dataset, and these are my hypotheses. And I’m not 100% sure how to extract what I’m looking for out of this data. And then another is, this is my general research question. Do you know what the magic dataset is that has X, Y, Z?”
The data DARC handles could be just about anything, says Storer. “A book is data. It could be recordings of conversations. It could be scanned PDFs from the archives of 17th-century France.”
However, Storer adds, in many cases the data are not in a quantitative form that researchers can easily use. “It’s not uncommon for our team to be deeply involved in figuring out how to construct and measure the variable of interest that will be associated with that data,” he says.
A recent example is a project that looked at the effects of video conferencing on collaboration. As a proxy for “attentiveness,” marketing professor Jonathan Levav and co-researcher Melanie Brucks sought to analyze where people’s gaze fell during a one-on-one meeting. Videos of the Zoom meetings were processed using a machine learning algorithm that could infer where people were looking based on head tilt and eye direction. “There’s a breathtaking amount of data cleaning and assumptions that go into that that has a huge impact on the quality of the data,” Storer says. “If our team were to screw it up, the data would be completely wrong.”
What Storer is getting at is a critical part of data analysis. “Clean” data is clean because the inputs have a consistency and uniformity that provides high confidence in the data’s credibility. In the case of the Zoom experiment, the DARC specialists needed a reliable way of calibrating the participants’ gaze as the meetings took place. “We could say, please keep your head this far from the screen, and please look to the left and up, look to the left and down, tilt your head as far as you’re comfortable tilting it, and then use those measurements to gauge the gaze direction.”
Doing this kind of fine-tuning after the fact requires an entirely different level of expertise. Enter Jiang. “Mason got his PhD in experimental physics, basically by shooting lasers at things,” Storer notes. He was brought in to determine how “dirty” the Zoom data was, and to perform the hygiene necessary to produce usable results.
DARC’s work may go unnoticed by readers and reviewers, but GSB faculty prize it. “When a department is recruiting new faculty, they like to send them to talk to us,” Williamsen says. “Someone they were recruiting recently said we were the one group she specifically asked to talk to because she had heard how much we can support her research.”
“The existence of the DARC team means the faculty don’t have to rely on raw documents for quantitative data or to be experts in data analysis,” says Storer. “It opens up new directions for the research that wouldn’t have been possible.”
For media inquiries, visit the Newsroom.