Big Data

Facing the Unsettling Power of AI to Analyze Our Photos

Michal Kosinski talks about exposing the dangers of new technologies and the controversies that come with it.

August 09, 2021

| by Lee Simmons
Close-up facial features on a black and white American flag. Each facial feature is overlayed with a blue or red square. The image is glitching, like a computer scanner. Animation by Cory Hall

Companies and governments are collecting our personal data wherever they find it — including the images we share online. | Cory Hall

Michal Kosinski’s research makes people uncomfortable. “As it should,” he says. “The privacy risks we uncover in our research should make anyone uncomfortable.”

In his most recent study, published earlier this year in Scientific Reports, Kosinski fed more than 1 million social media profile photos into a widely used facial recognition algorithm and found that it could correctly predict a person’s self-identified political ideology 72% of the time. In contrast, humans got it right 55% of the time.

Kosinski, an associate professor of organizational behavior at Stanford Graduate School of Business, does not see this as a breakthrough but rather a wake-up call. He hopes that his findings will alert people (and policymakers) to the misuse of this rapidly emerging technology.

Kosinski’s latest work builds on his 2018 paper in which he found that one of the most popular facial recognition algorithms, likely without its developers’ knowledge, could sort people based on their stated sexual orientation with startling accuracy. “We were surprised — and scared — by the results,” he recalls. When they reran the experiment with different faces, “the results held up.”

That study sparked a firestorm. Kosinski’s critics said he was engaging in “AI phrenology” and enabling digital discrimination. He responded that his detractors were shooting the messenger for publicizing the invasive and nefarious uses of a technology that is already widespread but whose threats to privacy are still relatively poorly understood.

He admits that his approach presents a paradox: “Many people have not yet realized that this technology has a dangerous potential. By running studies of this kind and trying to quantify the dangerous potential of those technologies, I am, of course, informing the general public, journalists, politicians, and dictators that, ‘Hey, this off-the-shelf technology has these dangerous properties.’ And I fully recognize this challenge.”

Kosinski stresses that he does not develop any artificial intelligence tools; he’s a psychologist who wants to better understand existing technologies and their potential to be used for good or ill. “Our lives are increasingly touched by the algorithms,” he says. Companies and governments are collecting our personal data wherever they can find it — and that includes the personal photos we publish online.

Kosinski spoke to Insights about the controversies surrounding his work and the implications of its findings.

How did you get interested in these issues?

I was looking at how digital footprints could be used to measure psychological traits, and I realized there was a huge privacy issue here that wasn’t fully appreciated at the time. In some early work, for instance, I showed that our Facebook likes reveal a lot more about us than we may realize. As I was looking at Facebook profiles, it struck me that profile pictures can also be revealing about our intimate traits. We all realize, of course, that faces reveal age, gender, emotions, fatigue, and a range of other psychological states and traits. But looking at the data produced by the facial recognition algorithms indicated that they can classify people based on intimate traits that are not obvious to humans, such as personality or political orientation. I couldn’t believe the results at the time.

I was trained as a psychologist, and the notion that you could learn something about such intimate psychological traits from a person’s appearance sounded like old-fashioned pseudoscience. Now, having thought a lot more about this, it strikes me as odd that we could ever think that our facial appearance should not be linked with our characters.

Surely we all make assumptions about people based on their appearance.

Of course. Lab studies show that we make these judgments instantly and automatically. Show someone a face for a few microseconds and they’ll have an opinion about that person. You can’t not do it. If you ask a bunch of test subjects, how smart is this person, how trustworthy, how liberal, how efficient — you get very consistent answers.

Quote
We can’t protect citizens by trying to conceal what we learn about the risks inherent in new technologies. People with a financial incentive are going to get there first.

Yet those judgments are not very accurate. In my studies where subjects were asked to look at social media photos and predict people’s sexual orientation or political views, the answers were only about 55% to 60% correct. Random guessing would get you 50%; that’s rather poor accuracy. And studies have shown this to be true for other traits as well: The opinions are consistent but often wrong. Still, the fact that people consistently show some accuracy shows that faces must be, to some degree, linked with personal traits.

You found that a facial recognition algorithm achieved much higher accuracy.

Right. In my study focused on political views, the machine got it right 72% of the time. And this was just an off-the-shelf algorithm running on my laptop, so there’s no reason to think that’s the best the machines can do.

I want to stress here that I did not train the algorithm to predict intimate traits, and I would never do so. Nobody should even be thinking about that before there are regulatory frameworks in place. I have shown that general purpose face-recognition software that’s available for free online can classify people based on their political views. It’s certainly not as good as what companies like Google or Facebook are already using.

What this tells us is that there’s a lot more information in the picture than people are able to perceive. Computers are just much better than humans at recognizing visual patterns in huge data sets. And the ability of the algorithms to interpret that information really introduces something new into the world.

So what happens when you combine that with the ubiquity of cameras today?

That’s the big question. I think people still feel that they can protect their privacy to some extent by making smart decisions and being careful about their security online. But there are closed-circuit TVs and surveillance cameras everywhere now, and we can’t hide our faces when we’re going about in public. We have no choice about whether we disclose this information — there’s no opt-in consent. And of course there are whole databases of ID photos that could be exploited by authorities. It changes the situation drastically.

Are there things people can do, like wearing masks, to make themselves more inscrutable to algorithms like this?

Probably not. You can wear a mask, but then the algorithm would just make predictions based on your forehead or eyes. Or if suddenly liberals tried to wear cowboy hats, the algorithm will be confused for the first three instances and then it will learn that cowboy hats are now meaningless when it comes to those predictions, and will adjust its beliefs.

Moreover, the key point here is that even if we could somehow hide our faces, predictions can be derived from myriad other types of data: voice recordings, clothing style, purchase records, web-browsing logs, and so on.

What is your response to people who liken this kind of research to phrenology or physiognomy?

Those people are jumping to conclusions a bit too early, because we’re not really talking about faces here. We are talking about facial appearance and facial images, which contain a lot of non-facial factors that are not biological, such as self-presentation, image quality, head orientation, and so on. In this recent paper I do not focus at all on biological aspects such as the shape of facial features, but simply show that algorithms can extract political orientation from facial images. I think that it is pretty intuitive that style, fashion, affluence, cultural norms, and environmental factors differ between liberals and conservatives and are reflected on our facial images.

Why did you decide to focus on sexual orientation in the earlier paper?

When we began to grasp the invasive potential of this, we thought one of the greatest threats — given how widespread homophobia still is and the real risk of persecution in some countries — was that it might be used to try to identify people’s sexual orientation. And when we tested it, we were surprised — and scared — by the results. We actually reran the experiment with different faces, because I just couldn’t believe that those algorithms — ostensibly designed to recognize people across different images — were, in fact, classifying people according to their sexual orientation with such high accuracy. But the results held up.

Also, we were reluctant to publish our results. We first shared it with groups that work to protect the rights of LGBTQ communities and with policymakers in the context of conferences focused on online security. It was only after two or three years that we decided to publish our results in a scientific journal and only after we found press articles reporting on startups offering such technologies. We wanted to make sure that the general public and policymakers are aware that those startups are, actually, onto something, and that this space is in urgent need for scrutiny and regulation.

Is there a risk that this tech could be wielded for commercial purposes?

It’s not a risk, it’s a reality. Once I realized that faces seem to be revealing about intimate traits, I did some research on patent applications. It turns out that back in 2008 through 2012, there were already patents filed by startups to do exactly that, and there are websites claiming to offer precisely those kinds of services. It was shocking to me, and it’s also usually shocking to readers of my work, because they think I came up with this, or at least that I revealed the potential so others could exploit it. In fact, there is already an industry pursuing this kind of invasive activity.

There’s a broader lesson here, which is that we can’t protect citizens by trying to conceal what we learn about the risks inherent in new technologies. People with a financial incentive are going to get there first. What we need is for policymakers to step up and acknowledge the serious privacy risks inherent in face-recognition systems so we can create regulatory guardrails.

Have you ever put your own photo through any of these algorithms, if only out of curiosity?

I believe that there are just much better methods of self-discovery than running one’s photo through an algorithm. The whole point of my research is that the algorithms should not be used for this purpose. I’ve never run my photo through it and I do not think anyone else should either.

For media inquiries, visit the Newsroom.

Explore More