4

DIGITAL TRUTH SERUM

Everybody lies.

广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元

People lie about how many drinks they had on the way home. They lie about how often they go to the gym, how much those new shoes cost, whether they read that book. They call in sick when they’re not. They say they’ll be in touch when they won’t. They say it’s not about you when it is. They say they love you when they don’t. They say they’re happy while in the dumps. They say they like women when they really like men.

People lie to friends. They lie to bosses. They lie to kids. They lie to parents. They lie to doctors. They lie to husbands. They lie to wives. They lie to themselves.

And they damn sure lie to surveys.

Here’s my brief survey for you:

Have you ever cheated on an exam? __________

Have you ever fantasized about killing someone? _________

Were you tempted to lie? Many people underreport embarrassing behaviors and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias.

An important paper in 1950 provided powerful evidence of how surveys can fall victim to such bias. Researchers collected data, from official sources, on the residents of Denver: what percentage of them voted, gave to charity, and owned a library card. They then surveyed the residents to see if the percentages would match. The results were, at the time, shocking. What the residents reported to the surveys was very different from the data the researchers had gathered. Even though nobody gave their names, people, in large numbers, exaggerated their voter registration status, voting behavior, and charitable giving.

阅读 ‧ 电子书库

Has anything changed in sixty-five years? In the age of the internet, not owning a library card is no longer embarrassing. But, while what’s embarrassing or desirable may have changed, people’s tendency to deceive pollsters remains strong.

A recent survey asked University of Maryland graduates various questions about their college experience. The answers were compared to official records. People consistently gave wrong information, in ways that made them look good. Fewer than 2 percent reported that they graduated with lower than a 2.5 GPA. (In reality, about 11 percent did.) And 44 percent said they had donated to the university in the past year. (In reality, about 28 percent did.)

And it is certainly possible that lying played a role in the failure of the polls to predict Donald Trump’s 2016 victory. Polls, on average, underestimated his support by about 2 percentage points. Some people may have been embarrassed to say they were planning to support him. Some may have claimed they were undecided when they were really going Trump’s way all along.

Why do people misinform anonymous surveys? I asked Roger Tourangeau, a research professor emeritus at the University of Michigan and perhaps the world’s foremost expert on social desirability bias. Our weakness for “white lies” is an important part of the problem, he explained. “About one-third of the time, people lie in real life,” he suggests. “The habits carry over to surveys.”

Then there’s that odd habit we sometimes have of lying to ourselves. “There is an unwillingness to admit to yourself that, say, you were a screw-up as a student,” says Tourangeau.

Lying to oneself may explain why so many people say they are above average. How big is this problem? More than 40 percent of one company’s engineers said they are in the top 5 percent. More than 90 percent of college professors say they do above-average work. One-quarter of high school seniors think they are in the top 1 percent in their ability to get along with other people. If you are deluding yourself, you can’t be honest in a survey.

Another factor that plays into our lying to surveys is our strong desire to make a good impression on the stranger conducting the interview, if there is someone conducting the interview, that is. As Tourangeau puts it, “A person who looks like your favorite aunt walks in. . . . Do you want to tell your favorite aunt you used marijuana last month?”* Do you want to admit that you didn’t give money to your good old alma mater?

For this reason, the more impersonal the conditions, the more honest people will be. For eliciting truthful answers, internet surveys are better than phone surveys, which are better than in-person surveys. People will admit more if they are alone than if others are in the room with them.

However, on sensitive topics, every survey method will elicit substantial misreporting. Tourangeau here used a word that is often thrown around by economists: “incentive.” People have no incentive to tell surveys the truth.

How, therefore, can we learn what our fellow humans are really thinking and doing?

In some instances, there are official data sources we can reference to get the truth. Even if people lie about their charitable donations, for example, we can get real numbers about giving in an area from the charities themselves. But when we are trying to learn about behaviors that are not tabulated in official records or we are trying to learn what people are thinking—their true beliefs, feelings, and desires—there is no other source of information except what people may deign to tell surveys. Until now, that is.

This is the second power of Big Data: certain online sources get people to admit things they would not admit anywhere else. They serve as a digital truth serum. Think of Google searches. Remember the conditions that make people more honest. Online? Check. Alone? Check. No person administering a survey? Check.

And there’s another huge advantage that Google searches have in getting people to tell the truth: incentives. If you enjoy racist jokes, you have zero incentive to share that un-PC fact with a survey. You do, however, have an incentive to search for the best new racist jokes online. If you think you may be suffering from depression, you don’t have an incentive to admit this to a survey. You do have an incentive to ask Google for symptoms and potential treatments.

Even if you are lying to yourself, Google may nevertheless know the truth. A couple of days before the election, you and some of your neighbors may legitimately think you will drive to a polling place and cast ballots. But, if you and they haven’t searched for any information on how to vote or where to vote, data scientists like me can figure out that turnout in your area will actually be low. Similarly, maybe you haven’t admitted to yourself that you may suffer from depression, even as you’re Googling about crying jags and difficulty getting out of bed. You would show up, however, in an area’s depression-related searches that I analyzed earlier in this book.

Think of your own experience using Google. I am guessing you have upon occasion typed things into that search box that reveal a behavior or thought that you would hesitate to admit in polite company. In fact, the evidence is overwhelming that a large majority of Americans are telling Google some very personal things. Americans, for instance, search for “porn” more than they search for “weather.” This is difficult, by the way, to reconcile with the survey data since only about 25 percent of men and 8 percent of women admit they watch pornography.

You may have also noticed a certain honesty in Google searches when looking at the way this search engine automatically tries to complete your queries. Its suggestions are based on the most common searches that other people have made. So auto-complete clues us in to what people are Googling. In fact, auto-complete can be a bit misleading. Google won’t suggest certain words it deems inappropriate, such as “cock,” “fuck,” and “porn.” This means auto-complete tells us that people’s Google thoughts are less racy than they actually are. Even so, some sensitive stuff often still comes up.

If you type “Why is . . .” the first two Google auto-completes currently are “Why is the sky blue?” and “Why is there a leap day?” suggesting these are the two most common ways to complete this search. The third: “Why is my poop green?” And Google auto-complete can get disturbing. Today, if you type in “Is it normal to want to . . . ,” the first suggestion is “kill.” If you type in “Is it normal to want to kill . . . ,” the first suggestion is “my family.”

Need more evidence that Google searches can give a different picture of the world than the one we usually see? Consider searches related to regrets around the decision to have or not to have children. Before deciding, some people fear they might make the wrong choice. And, almost always, the question is whether they will regret not having kids. People are seven times more likely to ask Google whether they will regret not having children than whether they will regret having children.

After making their decision—either to reproduce (or adopt) or not—people sometimes confess to Google that they rue their choice. This may come as something of a shock but post-decision, the numbers are reversed. Adults with children are 3.6 times more likely to tell Google they regret their decision than are adults without children.

One caveat that should be kept in mind throughout this chapter: Google can display a bias toward unseemly thoughts, thoughts people feel they can’t discuss with anyone else. Nonetheless, if we are trying to uncover hidden thoughts, Google’s ability to ferret them out can be useful. And the large disparity between regrets on having versus not having kids seems to be telling us that the unseemly thought in this case is a significant one.

Let’s pause for a moment to consider what it even means to make a search such as “I regret having children.” Google presents itself as a source from which we can seek information directly, on topics like the weather, who won last night’s game, or when the Statue of Liberty was erected. But sometimes we type our uncensored thoughts into Google, without much hope that it will be able to help us. In this case, the search window serves as a kind of confessional.

There are thousands of searches every year, for example, for “I hate cold weather,” “People are annoying,” and “I am sad.” Of course, those thousands of Google searches for “I am sad” represent only a tiny of fraction of the hundreds of millions of people who feel sad in a given year. Searches expressing thoughts, rather than looking for information, my research has found, are only made by a small sample of everyone for whom that thought comes to mind. Similarly, my research suggests that the seven thousand searches by Americans every year for “I regret having children” represent a small sample of those who have had that thought.

Kids are obviously a huge joy for many, probably most, people. And, despite my mom’s fear that “you and your stupid data analysis” are going to limit her number of grandchildren, this research has not changed my desire to have kids. But that unseemly regret is interesting—and another aspect of humanity that we tend not to see in the traditional datasets. Our culture is constantly flooding us with images of wonderful, happy families. Most people would never consider having children as something they might regret. But some do. They may admit this to no one—except Google.

THE TRUTH ABOUT SEX

How many American men are gay? This is a legendary question in sexuality research. Yet it has been among the toughest questions for social scientists to answer. Psychologists no longer believe Alfred Kinsey’s famous estimate—based on surveys that oversampled prisoners and prostitutes—that 10 percent of American men are gay. Representative surveys now tell us about 2 to 3 percent are. But sexual preference has long been among the subjects upon which people have tended to lie. I think I can use Big Data to give a better answer to this question than we have ever had.

First, more on that survey data. Surveys tell us there are far more gay men in tolerant states than intolerant states. For example, according to a Gallup survey, the proportion of the population that is gay is almost twice as high in Rhode Island, the state with the highest support for gay marriage, than Mississippi, the state with the lowest support for gay marriage.

There are two likely explanations for this. First, gay men born in intolerant states may move to tolerant states. Second, gay men in intolerant states may not divulge that they are gay; they are even more likely to lie.

Some insight into explanation number one—gay mobility—can be gleaned from another Big Data source: Facebook, which allows users to list what gender they are interested in. About 2.5 percent of male Facebook users who list a gender of interest say they are interested in men; that corresponds roughly with what the surveys indicate. And Facebook too shows big differences in the gay population in states with high versus low tolerance: Facebook has the gay population more than twice as high in Rhode Island as in Mississippi.

Facebook also can provide information on how people move around. I was able to code the hometown of a sample of openly gay Facebook users. This allowed me to directly estimate how many gay men move out of intolerant states into more tolerant parts of the country. The answer? There is clearly some mobility—from Oklahoma City to San Francisco, for example. But I estimate that men packing up their Judy Garland CDs and heading to someplace more open-minded can explain less than half of the difference in the openly gay population in tolerant versus intolerant states.*

In addition, Facebook allows us to focus in on high school students. This is a special group, because high school boys rarely get to choose where they live. If mobility explained the state-by-state differences in the openly gay population, these differences should not appear among high school users. So what does the high school data say? There are far fewer openly gay high school boys in intolerant states. Only two in one thousand male high school students in Mississippi are openly gay. So it ain’t just mobility.

If a similar number of gay men are born in every state and mobility cannot fully explain why some states have so many more openly gay men, the closet must be playing a big role. Which brings us back to Google, with which so many people have proved willing to share so much.

Might there be a way to use porn searches to test how many gay men there really are in different states? Indeed, there is. Countrywide, I estimate—using data from Google searches and Google AdWords—that about 5 percent of male porn searches are for gay-male porn. (These would include searches for such terms as “Rocket Tube,” a popular gay pornographic site, as well as “gay porn.”)

And how does this vary in different parts of the country? Overall, there are more gay porn searches in tolerant states compared to intolerant states. This makes sense, given that some gay men move out of intolerant places into tolerant places. But the differences are not nearly as large as the differences suggested by either surveys or Facebook. In Mississippi, I estimate that 4.8 percent of male porn searches are for gay porn, far higher than the numbers suggested by either surveys or Facebook and reasonably close to the 5.2 percent of pornography searches that are for gay porn in Rhode Island.

So how many American men are gay? This measure of pornography searches by men—roughly 5 percent are same-sex—seems a reasonable estimate of the true size of the gay population in the United States. And there is another, less straightforward way to get at this number. It requires some data science. We could utilize the relationship between tolerance and the openly gay population. Bear with me a bit here.

My preliminary research indicates that in a given state every 20 percentage points of support for gay marriage means about one and a half times as many men from that state will identify openly as gay on Facebook. Based on this, we can estimate how many men born in a hypothetically fully tolerant place—where, say, 100 percent of people supported gay marriage—would be openly gay. My estimate is about 5 percent would be, which fits the data from porn searches nicely. The closest we might have to growing up in a fully tolerant environment is high school boys in California’s Bay Area. About 4 percent of them are openly gay on Facebook. That seems in line with my calculation.

I should note that I have not yet been able to come up with an estimate of same-sex attraction for women. The pornography numbers are less useful here, since far fewer women watch pornography, making the sample less representative. And of those who do, even women who are primarily attracted to men in real life seem to enjoy viewing lesbian porn. Fully 20 percent of videos watched by women on PornHub are lesbian.

Five percent of American men being gay is an estimate, of course. Some men are bisexual; some—especially when young—are not sure what they are. Obviously, you can’t count this as precisely as you might the number of people who vote or attend a movie.

But one consequence of my estimate is clear: an awful lot of men in the United States, particularly in intolerant states, are still in the closet. They don’t reveal their sexual preferences on Facebook. They don’t admit it on surveys. And in many cases, they may even be married to women.

It turns out that wives suspect their husbands of being gay rather frequently. They demonstrate that suspicion in the surprisingly common search: “Is my husband gay?” “Gay” is 10 percent more likely to complete searches that begin “Is my husband . . .” than the second-place word, “cheating.” It is eight times more common than “an alcoholic” and ten times more common than “depressed.”

Most tellingly perhaps, searches questioning a husband’s sexuality are far more prevalent in the least tolerant regions. The states with the highest percentage of women asking this question are South Carolina and Louisiana. In fact, in twenty-one of the twenty-five states where this question is most frequently asked, support for gay marriage is lower than the national average.

Google and porn sites aren’t the only useful data resources when it comes to men’s sexuality. There is more evidence available in Big Data on what it means to live in the closet. I analyzed ads on Craigslist for males looking for “casual encounters.” The percentage of these ads that are seeking casual encounters with men tends to be larger in less tolerant states. Among the states with the highest percentages are Kentucky, Louisiana, and Alabama.

And for even more of a glimpse into the closet, let’s return to Google search data and get a little more granular. One of the most common searches made immediately before or after “gay porn” is “gay test.” (These tests presume to tell men whether or not they are homosexual.) And searches for “gay test” are about twice as prevalent in the least tolerant states.

What does it mean to go back and forth between searching for “gay porn” and searching for “gay test”? Presumably, it suggests a fairly confused if not tortured mind. It’s reasonable to suspect that some of these men are hoping to confirm that their interest in gay porn does not actually mean they’re gay.

The Google search data does not allow us to see a particular user’s search history over time. However, in 2006, AOL released a sample of their users’ searches to academic researchers. Here are some of one anonymous user’s searches over a six-day period.

Friday 03:49:55

free gay picks [sic]

Friday 03:59:37

locker room gay picks

Friday 04:00:14

gay picks

Friday 04:00:35

gay sex picks

Friday 05:08:23

a long gay quiz

Friday 05:10:00

a good gay test

Friday 05:25:07

gay tests for a confused man

Friday 05:26:38

gay tests

Friday 05:27:22

am i gay tests

Friday 05:29:18

gay picks

Friday 05:30:01

naked men picks

Friday 05:32:27

free nude men picks

Friday 05:38:19

hot gay sex picks

Friday 05:41:34

hot man butt sex

Wednesday 13:37:37

am i gay tests

Wednesday 13:41:20

gay tests

Wednesday 13:47:49

hot man butt sex

Wednesday 13:50:31

free gay sex vidio [sic]

This certainly reads like a man who is not comfortable with his sexuality. And the Google data tells us there are still many men like him. Most of them, in fact, live in states that are less tolerant of same-sex relationships.

For an even closer look at the people behind these numbers, I asked a psychiatrist in Mississippi, who specializes in helping closeted gay men, if any of his patients might want to talk to me. One man reached out. He told me he was a retired professor, in his sixties, and married to the same woman for more than forty years.

About ten years ago, overwhelmed with stress, he saw the psychiatrist and finally acknowledged his sexuality. He has always known he was attracted to men, he says, but thought that this was universal and something that all men just hid. Shortly after beginning therapy, he had his first, and only, gay sexual encounter, with a student of his who was in his late twenties, an experience he describes as “wonderful.”

He and his wife do not have sex. He says that he would feel guilty ever ending his marriage or openly dating a man. He regrets virtually every one of his major life decisions.

The retired professor and his wife will go another night without romantic love, without sex. Despite enormous progress, the persistence of intolerance will cause millions of other Americans to do the same.

You may not be shocked to learn that 5 percent of men are gay and that many remain in the closet. There have been times when most people would have been shocked. And there are still places where many people would be shocked as well.

“In Iran we don’t have homosexuals like in your country,” Mahmoud Ahmadinejad, then president of Iran, insisted in 2007. “In Iran we do not have this phenomenon.” Likewise, Anatoly Pakhomov, mayor of Sochi, Russia, shortly before his city hosted the 2014 Winter Olympics, said of gay people, “We do not have them in our city.” Yet internet behavior reveals significant interest in gay porn in Sochi and Iran.

This raises an obvious question: are there any common sexual interests in the United States today that are still considered shocking? It depends what you consider common and how easily shocked you are.

Most of the top searches on PornHub are not surprising—they include terms like “teen,” “threesome,” and “blowjob” for men, phrases like “passionate love making,” “nipple sucking,” and “man eating pussy” for women.

Leaving the mainstream, PornHub data does tell us about some fetishes that you might not have ever guessed existed. There are women who search for “anal apples” and “humping stuffed animals.” There are men who search for “snot fetish” and “nude crucifixion.” But these searches are rare—only about ten every month even on this huge porn site.

Another related point that becomes quite clear when reviewing PornHub data: there’s someone out there for everyone. Women, not surprisingly, often search for “tall” guys, “dark” guys, and “handsome” guys. But they also sometimes search for “short” guys, “pale” guys, and “ugly” guys. There are women who search for “disabled” guys, “chubby guy with small dick,” and “fat ugly old man.” Men frequently search for “thin” women, women with “big tits,” and women with “blonde” hair. But they also sometimes search for “fat” women, women with “tiny tits,” and women with “green hair.” There are men who search for “bald” women, “midget” women, and women with “no nipples.” This data can be cheering for those who are not tall, dark, and handsome or thin, big-breasted, and blonde.*

What about other searches that are both common and surprising? Among the 150 most common searches by men, the most surprising for me are the incestuous ones I discussed in the chapter on Freud. Other little-discussed objects of men’s desire are “shemales” (77th most common search) and “granny” (110th most common search). Overall, about 1.4 percent of men’s PornHub searches are for women with penises. About 0.6 percent (0.4 percent for men under the age of thirty-four) are for the elderly. Only 1 in 24,000 PornHub searches by men are explicitly for preteens; that may have something to do with the fact that PornHub, for obvious reasons, bans all forms of child pornography and possessing it is illegal.

Among the top PornHub searches by women is a genre of pornography that, I warn you, will disturb many readers: sex featuring violence against women. Fully 25 percent of female searches for straight porn emphasize the pain and/or humiliation of the woman—“painful anal crying,” “public disgrace,” and “extreme brutal gangbang,” for example. Five percent look for nonconsensual sex—“rape” or “forced” sex—even though these videos are banned on PornHub. And search rates for all these terms are at least twice as common among women as among men. If there is a genre of porn in which violence is perpetrated against a woman, my analysis of the data shows that it almost always appeals disproportionately to women.

Of course, when trying to come to terms with this, it is really important to remember that there is a difference between fantasy and real life. Yes, of the minority of women who visit PornHub, there is a subset who search—unsuccessfully—for rape imagery. To state the obvious, this does not mean women want to be raped in real life and it certainly doesn’t make rape any less horrific a crime. What the porn data does tell us is that sometimes people have fantasies they wish they didn’t have and which they may never mention to others.

Closets are not just repositories of fantasies. When it comes to sex, people keep many secrets—about how much they are having, for example.

In the introduction, I noted that Americans report using far more condoms than are sold every year. You might therefore think this means they are just saying they use condoms more often during sex than they actually do. The evidence suggests they also exaggerate how frequently they are having sex to begin with. About 11 percent of women between the ages of fifteen and forty-four say they are sexually active, not currently pregnant, and not using contraception. Even with relatively conservative assumptions about how many times they are having sex, scientists would expect 10 percent of them to become pregnant every month. But this would already be more than the total number of pregnancies in the United States (which is 1 in 113 women of childbearing age). In our sex-obsessed culture it can be hard to admit that you are just not having that much.

But if you’re looking for understanding or advice, you have, once again, an incentive to tell Google. On Google, there are sixteen times more complaints about a spouse not wanting sex than about a married partner not being willing to talk. There are five and a half times more complaints about an unmarried partner not wanting sex than an unmarried partner refusing to text back.

And Google searches suggest a surprising culprit for many of these sexless relationships. There are twice as many complaints that a boyfriend won’t have sex than that a girlfriend won’t have sex. By far, the number one search complaint about a boyfriend is “My boyfriend won’t have sex with me.” (Google searches are not broken down by gender, but, since the previous analysis said that 95 percent of men are straight, we can guess that not too many “boyfriend” searches are coming from men.)

How should we interpret this? Does this really imply that boyfriends withhold sex more than girlfriends? Not necessarily. As mentioned earlier, Google searches can be biased in favor of stuff people are uptight talking about. Men may feel more comfortable telling their friends about their girlfriend’s lack of sexual interest than women are telling their friends about their boyfriend’s. Still, even if the Google data does not imply that boyfriends are really twice as likely to avoid sex as girlfriends, it does suggest that boyfriends avoiding sex is more common than people let on.

Google data also suggests a reason people may be avoiding sex so frequently: enormous anxiety, with much of it misplaced. Start with men’s anxieties. It isn’t news that men worry about how well-endowed they are, but the degree of this worry is rather profound.

Men Google more questions about their sexual organ than any other body part: more than about their lungs, liver, feet, ears, nose, throat, and brain combined. Men conduct more searches for how to make their penises bigger than how to tune a guitar, make an omelet, or change a tire. Men’s top Googled concern about steroids isn’t whether they may damage their health but whether taking them might diminish the size of their penis. Men’s top Googled question related to how their body or mind would change as they aged was whether their penis would get smaller.

Side note: One of the more common questions for Google regarding men’s genitalia is “How big is my penis?” That men turn to Google, rather than a ruler, with this question is, in my opinion, a quintessential expression of our digital era.*

Do women care about penis size? Rarely, according to Google searches. For every search women make about a partner’s phallus, men make roughly 170 searches about their own. True, on the rare occasions women do express concerns about a partner’s penis, it is frequently about its size, but not necessarily that it’s small. More than 40 percent of complaints about a partner’s penis size say that it’s too big. “Pain” is the most Googled word used in searches with the phrase “___ during sex.” (“Bleeding,” “peeing,” “crying,” and “farting” round out the top five.) Yet only 1 percent of men’s searches looking to change their penis size are seeking information on how to make it smaller.

Men’s second-most-common sex question is how to make their sexual encounters longer. Once again, the insecurities of men do not appear to match the concerns of women. There are roughly the same number of searches asking how to make a boyfriend climax more quickly as climax more slowly. In fact, the most common concern women have related to a boyfriend’s orgasm isn’t about when it happened but why it isn’t happening at all.

We don’t often talk about body image issues when it comes to men. And while it’s true that overall interest in personal appearance skews female, it’s not as lopsided as stereotypes would suggest. According to my analysis of Google AdWords, which measures the websites people visit, interest in beauty and fitness is 42 percent male, weight loss is 33 percent male, and cosmetic surgery is 39 percent male. Among all searches with “how to” related to breasts, about 20 percent ask how to get rid of man breasts.

But, even if the number of men who lack confidence in their bodies is higher than most people would think, women still outpace them when it comes to insecurity about how they look. So what can this digital truth serum reveal about women’s self-doubt? Every year in the United States, there are more than seven million searches looking into breast implants. Official statistics tell us that about 300,000 women go through with the procedure annually.

Women also show a great deal of insecurity about their behinds, although many women have recently flip-flopped on what it is they don’t like about them.

In 2004, in some parts of the United States, the most common search regarding changing one’s butt was how to make it smaller. The desire to make one’s bottom bigger was overwhelmingly concentrated in areas with large black populations. Beginning in 2010, however, the desire for bigger butts grew in the rest of the United States. This interest, if not the posterior distribution itself, has tripled in four years. In 2014, there were more searches asking how to make your butt bigger than smaller in every state. These days, for every five searches looking into breast implants in the United States, there is one looking into butt implants. (Thank you, Kim Kardashian!)

Does women’s growing preference for a larger bottom match men’s preferences? Interestingly, yes. “Big butt porn” searches, which also used to be concentrated in black communities, have recently shot up in popularity throughout the United States.

What else do men want in a woman’s body? As mentioned earlier, and as most will find blindingly obvious, men show a preference for large breasts. About 12 percent of nongeneric pornographic searches are looking for big breasts. This is nearly twenty times higher than the search volume for small-breast porn.

That said, it is not clear that this means men want women to get breast implants. About 3 percent of big-breast porn searches explicitly say they want to see natural breasts.

Google searches about one’s wife and breast implants are evenly split between asking how to persuade her to get implants and perplexity as to why she wants them.

Or consider the most common search about a girlfriend’s breasts: “I love my girlfriend’s boobs.” It is not clear what men are hoping to find from Google when making this search.

Women, like men, have questions about their genitals. In fact, they have nearly as many questions about their vaginas as men have about their penises. Women’s worries about their vaginas are often health related. But at least 30 percent of their questions take up other concerns. Women want to know how to shave it, tighten it, and make it taste better. A strikingly common concern, as touched upon earlier, is how to improve its odor.

Women are most frequently concerned that their vaginas smell like fish, followed by vinegar, onions, ammonia, garlic, cheese, body odor, urine, bread, bleach, feces, sweat, metal, feet, garbage, and rotten meat.

In general, men do not make many Google searches involving a partner’s genitalia. Men make roughly the same number of searches about a girlfriend’s vagina as women do about a boyfriend’s penis.

When men do search about a partner’s vagina, it is usually to complain about what women worry about most: the odor. Mostly, men are trying to figure out how to tell a woman about a bad odor without hurting her feelings. Sometimes, however, men’s questions about odor reveal their own insecurities. Men occasionally ask for ways to use the smell to detect cheating—if it smells like condoms, for example, or another man’s semen.

What should we make of all this secret insecurity? There is clearly some good news here. Google gives us legitimate reasons to worry less than we do. Many of our deepest fears about how our sexual partners perceive us are unjustified. Alone, at their computers, with no incentive to lie, partners reveal themselves to be fairly nonsuperficial and forgiving. In fact, we are all so busy judging our own bodies that there is little energy left over to judge other people’s.

There is also probably a connection between two of the big concerns revealed in the sexual searches on Google: lack of sex and an insecurity about one’s sexual attractiveness and performance. Maybe these are related. Maybe if we worried less about sex, we’d have more of it.

What else can Google searches tell us about sex? We can do a battle of the sexes, to see who is most generous. Take all searches looking for ways to get better at performing oral sex on the opposite gender. Do men look for more tips or women? Who is more sexually generous, men or women? Women, duh. Adding up all the possibilities, I estimate the ratio is 2:1 in favor of women looking for advice on how to better perform oral sex on their partner.

And when men do look for tips on how to give oral sex, they are frequently not looking for ways of pleasing another person. Men make as many searches looking for ways to perform oral sex on themselves as they do how to give a woman an orgasm. (This is among my favorite facts in Google search data.)

THE TRUTH ABOUT HATE AND PREJUDICE

Sex and romance are hardly the only topics cloaked in shame and, therefore, not the only topics about which people keep secrets. Many people are, for good reason, inclined to keep their prejudices to themselves. I suppose you could call it progress that many people today feel they will be judged if they admit they judge other people based on their ethnicity, sexual orientation, or religion. But many Americans still do. (This is another section, I warn readers, that includes disturbing material.)

You can see this on Google, where users sometimes ask questions such as “Why are black people rude?” or “Why are Jews evil?” Below, in order, are the top five negative words used in searches about various groups.

阅读 ‧ 电子书库

A few patterns among these stereotypes stand out. For example, African Americans are the only group that faces a “rude” stereotype. Nearly every group is a victim of a “stupid” stereotype; the only two that are not: Jews and Muslims. The “evil” stereotype is applied to Jews, Muslims, and gays but not black people, Mexicans, Asians, and Christians.

Muslims are the only group stereotyped as terrorists. When a Muslim American plays into this stereotype, the response can be instantaneous and vicious. Google search data can give us a minute-by-minute peek into such eruptions of hate-fueled rage.

Consider what happened shortly after the mass shooting in San Bernardino, California, on December 2, 2015. That morning, Rizwan Farook and Tashfeen Malik entered a meeting of Farook’s coworkers armed with semiautomatic pistols and semiautomatic rifles and murdered fourteen people. That evening, literally minutes after the media first reported one of the shooters’ Muslim-sounding name, a disturbing number of Californians had decided what they wanted to do with Muslims: kill them.

The top Google search in California with the word “Muslims” in it at the time was “kill Muslims.” And overall, Americans searched for the phrase “kill Muslims” with about the same frequency that they searched for “martini recipe,” “migraine symptoms,” and “Cowboys roster.” In the days following the San Bernardino attack, for every American concerned with “Islamophobia,” another was searching for “kill Muslims.” While hate searches were approximately 20 percent of all searches about Muslims before the attack, more than half of all search volume about Muslims became hateful in the hours that followed it.

And this minute-by-minute search data can tell us how difficult it can be to calm this rage. Four days after the shooting, then-president Obama gave a prime-time address to the country. He wanted to reassure Americans that the government could both stop terrorism and, perhaps more important, quiet this dangerous Islamophobia.

Obama appealed to our better angels, speaking of the importance of inclusion and tolerance. The rhetoric was powerful and moving. The Los Angeles Times praised Obama for “[warning] against allowing fear to cloud our judgment.” The New York Times called the speech both “tough” and “calming.” The website Think Progress praised it as “a necessary tool of good governance, geared towards saving the lives of Muslim Americans.” Obama’s speech, in other words, was judged a major success. But was it?

Google search data suggests otherwise. Together with Evan Soltas, then at Princeton, I examined the data. In his speech, the president said, “It is the responsibility of all Americans—of every faith—to reject discrimination.” But searches calling Muslims “terrorists,” “bad,” “violent,” and “evil” doubled during and shortly after the speech. President Obama also said, “It is our responsibility to reject religious tests on who we admit into this country.” But negative searches about Syrian refugees, a mostly Muslim group then desperately looking for a safe haven, rose 60 percent, while searches asking how to help Syrian refugees dropped 35 percent. Obama asked Americans to “not forget that freedom is more powerful than fear.” Yet searches for “kill Muslims” tripled during his speech. In fact, just about every negative search we could think to test regarding Muslims shot up during and after Obama’s speech, and just about every positive search we could think to test declined.

In other words, Obama seemed to say all the right things. All the traditional media congratulated Obama on his healing words. But new data from the internet, offering digital truth serum, suggested that the speech actually backfired in its main goal. Instead of calming the angry mob, as everybody thought he was doing, the internet data tells us that Obama actually inflamed it. Things that we think are working can have the exact opposite effect from the one we expect. Sometimes we need internet data to correct our instinct to pat ourselves on the back.

So what should Obama have said to quell this particular form of hatred currently so virulent in America? We’ll circle back to that later. Right now we’re going to take a look at an age-old vein of prejudice in the United States, the form of hate that in fact stands out above the rest, the one that has been the most destructive and the topic of the research that began this book. In my work with Google search data, the single most telling fact I have found regarding hate on the internet is the popularity of the word “nigger.”

Either singular or in its plural form, the word “nigger” is included in seven million American searches every year. (Again, the word used in rap songs is almost always “nigga,” not “nigger,” so there’s no significant impact from hip-hop lyrics to account for.) Searches for “nigger jokes” are seventeen times more common than searches for “kike jokes,” “gook jokes,” “spic jokes,” “chink jokes,” and “fag jokes” combined.

When are searches for “nigger(s)”—or “nigger jokes”—most common? Whenever African-Americans are in the news. Among the periods when such searches were highest was the immediate aftermath of Hurricane Katrina, when television and newspapers showed images of desperate black people in New Orleans struggling for their survival. They also shot up during Obama’s first election. And searches for “nigger jokes” rise on average about 30 percent on Martin Luther King Jr. Day.

The frightening ubiquity of this racial slur throws into doubt some current understandings of racism.

Any theory of racism has to explain a big puzzle in America. On the one hand, the overwhelming majority of black Americans think they suffer from prejudice—and they have ample evidence of discrimination in police stops, job interviews, and jury decisions. On the other hand, very few white Americans will admit to being racist.

The dominant explanation among political scientists recently has been that this is due, in large part, to widespread implicit prejudice. White Americans may mean well, this theory goes, but they have a subconscious bias, which influences their treatment of black Americans. Academics invented an ingenious way to test for such a bias. It is called the implicit-association test.

The tests have consistently shown that it takes most people milliseconds longer to associate black faces with positive words, such as “good,” than with negative words, such as “awful.” For white faces, the pattern is reversed. The extra time it takes is evidence of someone’s implicit prejudice—a prejudice the person may not even be aware of.

There is, though, an alternative explanation for the discrimination that African-Americans feel and whites deny: hidden explicit racism. Suppose there is a reasonably widespread conscious racism of which people are very much aware but to which they won’t confess—certainly not in a survey. That’s what the search data seems to be saying. There is nothing implicit about searching for “nigger jokes.” And it’s hard to imagine that Americans are Googling the word “nigger” with the same frequency as “migraine” and “economist” without explicit racism having a major impact on African-Americans. Prior to the Google data, we didn’t have a convincing measure of this virulent animus. Now we do. We are, therefore, in a position to see what it explains.

It explains, as discussed earlier, why Obama’s vote totals in 2008 and 2012 were depressed in many regions. It also correlates with the black-white wage gap, as a team of economists recently reported. The areas that I had found make the most racist searches, in other words, underpay black people. And then there is the phenomenon of Donald Trump’s candidacy. As noted in the introduction, when Nate Silver, the polling guru, looked for the geographic variable that correlated most strongly with support in the 2016 Republican primary for Trump, he found it in the map of racism I had developed. That variable was searches for “nigger(s).”

Scholars have recently put together a state-by-state measure of implicit prejudice against black people, which has enabled me to compare the effects of explicit racism, as measured by Google searches, and implicit bias. For example, I tested how much each worked against Obama in both of his presidential elections. Using regression analysis, I found that, to predict where Obama underperformed, an area’s racist Google searches explained a lot. An area’s performance on implicit-association tests added little.

To be provocative and to encourage more research in this area, let me put forth the following conjecture, ready to be tested by scholars across a range of fields. The primary explanation for discrimination against African Americans today is not the fact that the people who agree to participate in lab experiments make subconscious associations between negative words and black people; it is the fact that millions of white Americans continue to do things like search for “nigger jokes.”

The discrimination black people regularly experience in the United States appears to be fueled more widely by explicit, if hidden, hostility. But, for other groups, subconscious prejudice may have a more fundamental impact. For example, I was able to use Google searches to find evidence of implicit prejudice against another segment of the population: young girls.

And who, might you ask, would be harboring bias against girls?

Their parents.

It’s hardly surprising that parents of young children are often excited by the thought that their kids might be gifted. In fact, of all Google searches starting “Is my 2-year-old,” the most common next word is “gifted.” But this question is not asked equally about young boys and young girls. Parents are two and a half times more likely to ask “Is my son gifted?” than “Is my daughter gifted?” Parents show a similar bias when using other phrases related to intelligence that they may shy away from saying aloud, like, “Is my son a genius?”

Are parents picking up on legitimate differences between young girls and boys? Perhaps young boys are more likely than young girls to use big words or otherwise show objective signs of giftedness? Nope. If anything, it’s the opposite. At young ages, girls have consistently been shown to have larger vocabularies and use more complex sentences. In American schools, girls are 9 percent more likely than boys to be in gifted programs. Despite all this, parents looking around the dinner table appear to see more gifted boys than girls.* In fact, on every search term related to intelligence I tested, including those indicating its absence, parents were more likely to be inquiring about their sons rather than their daughters. There are also more searches for “is my son behind” or “stupid” than comparable searches for daughters. But searches with negative words like “behind” and “stupid” are less specifically skewed toward sons than searches with positive words, such as “gifted” or “genius.”

What then are parents’ overriding concerns regarding their daughters? Primarily, anything related to appearance. Consider questions about a child’s weight. Parents Google “Is my daughter overweight?” roughly twice as frequently as they Google “Is my son overweight?” Parents are about twice as likely to ask how to get their daughters to lose weight as they are to ask how to get their sons to do the same. Just as with giftedness, this gender bias is not grounded in reality. About 28 percent of girls are overweight, while 35 percent of boys are. Even though scales measure more overweight boys than girls, parents see—or worry about—overweight girls much more frequently than overweight boys.

Parents are also one and a half times more likely to ask whether their daughter is beautiful than whether their son is handsome. And they are nearly three times more likely to ask whether their daughter is ugly than whether their son is ugly. (How Google is expected to know whether a child is beautiful or ugly is hard to say.)

In general, parents seem more likely to use positive words in questions about sons. They are more apt to ask whether a son is “happy” and less apt to ask whether a son is “depressed.”

Liberal readers may imagine that these biases are more common in conservative parts of the country, but I didn’t find any evidence of that. In fact, I did not find a significant relationship between any of these biases and the political or cultural makeup of a state. Nor is there evidence that these biases have decreased since 2004, the year for which Google search data is first available. It would seem this bias against girls is more widespread and deeply ingrained than we’d care to believe.

Sexism is not the only place our stereotypes about prejudice may be off.

Vikingmaiden88 is twenty-six years old. She enjoys reading history and writing poetry. Her signature quote is from Shakespeare. I gleaned all this from her profile and posts on Stormfront.org, America’s most popular online hate site. I also learned that Vikingmaiden88 has enjoyed the content on the site of the newspaper I work for, the New York Times. She wrote an enthusiastic post about a particular Times feature.

I recently analyzed tens of thousands of such Stormfront profiles, in which registered members can enter their location, birth date, interests, and other information.

Stormfront was founded in 1995 by Don Black, a former Ku Klux Klan leader. Its most popular “social groups” are “Union of National Socialists” and “Fans and Supporters of Adolf Hitler.” Over the past year, according to Quantcast, roughly 200,000 to 400,000 Americans visited the site every month. A recent Southern Poverty Law Center report linked nearly one hundred murders in the past five years to registered Stormfront members.

Stormfront members are not whom I would have guessed.

They tend to be young, at least according to self-reported birth dates. The most common age at which people join the site is nineteen. And four times more nineteen-year-olds sign up than forty-year-olds. Internet and social network users lean young, but not nearly that young.

Profiles do not have a field for gender. But I looked at all the posts and complete profiles of a random sample of American users, and it turns out that you can work out the gender of most of the membership: I estimate that about 30 percent of Stormfront members are female.

The states with the most members per capita are Montana, Alaska, and Idaho. These states tend to be overwhelmingly white. Does this mean that growing up with little diversity fosters hate?

Probably not. Rather, since those states have a higher proportion of non-Jewish white people, they have more potential members for a group that attacks Jews and nonwhites. The percentage of Stormfront’s target audience that joins is actually higher in areas with more minorities. This is particularly true when you look at Stormfront’s members who are eighteen and younger and therefore do not themselves choose where they live.

Among this age group, California, a state with one of the largest minority populations, has a membership rate 25 percent higher than the national average.

One of the most popular social groups on the site is “In Support of Anti-Semitism.” The percentage of members who join this group is positively correlated with a state’s Jewish population. New York, the state with the highest Jewish population, has above-average per capita membership in this group.

In 2001, Dna88 joined Stormfront, describing himself as a “good looking, racially aware” thirty-year-old Internet developer living in “Jew York City.” In the next four months, he wrote more than two hundred posts, like “Jewish Crimes Against Humanity” and “Jewish Blood Money,” and directed people to a website, jewwatch.com, which claims to be a “scholarly library” on “Zionist criminality.”

Stormfront members complain about minorities’ speaking different languages and committing crimes. But what I found most interesting were the complaints about competition in the dating market.

A man calling himself William Lyon Mackenzie King, after a former prime minister of Canada who once suggested that “Canada should remain a white man’s country,” wrote in 2003 that he struggled to “contain” his “rage” after seeing a white woman “carrying around her half black ugly mongrel niglet.” In her profile, Whitepride26, a forty-one-year-old student in Los Angeles, says, “I dislike blacks, Latinos, and sometimes Asians, especially when men find them more attractive” than “a white female.”

Certain political developments play a role. The day that saw the biggest single increase in membership in Stormfront’s history, by far, was November 5, 2008, the day after Barack Obama was elected president. There was, however, no increased interest in Stormfront during Donald Trump’s candidacy and only a small rise immediately after he won. Trump rode a wave of white nationalism. There is no evidence here that he created a wave of white nationalism.

Obama’s election led to a surge in the white nationalist movement. Trump’s election seems to be a response to that.

One thing that does not seem to matter: economics. There was no relationship between monthly membership registration and a state’s unemployment rate. States disproportionately affected by the Great Recession saw no comparative increase in Google searches for Stormfront.

But perhaps what was most interesting—and surprising—were some of the topics of conversation Stormfront members have. They are similar to those my friends and I talk about. Maybe it was my own naïveté, but I would have imagined white nationalists inhabiting a different universe from that of my friends and me. Instead they have long threads praising Game of Thrones and discussing the comparative merits of online dating sites, like PlentyOfFish and OkCupid.

And the key fact that shows that Stormfront users are inhabiting similar universes as people like me and my friends: the popularity of the New York Times among Stormfront users. It isn’t just VikingMaiden88 hanging around the Times site. The site is popular among many of its members. In fact, when you compare Stormfront users to people who visit the Yahoo News site, it turns out that the Stormfront crowd is twice as likely to visit nytimes.com.

Members of a hate site perusing the oh-so-liberal nytimes.com? How could this possibly be? If a substantial number of Stormfront members get their news from nytimes.com, it means our conventional wisdom about white nationalists is wrong. It also means our conventional wisdom about how the internet works is wrong.

THE TRUTH ABOUT THE INTERNET

The internet, most everybody agrees, is driving Americans apart, causing most people to hole up in sites geared toward people like them. Here’s how Cass Sunstein of Harvard Law School described the situation: “Our communications market is rapidly moving [toward a situation where] people restrict themselves to their own points of view—liberals watching and reading mostly or only liberals; moderates, moderates; conservatives, conservatives; Neo-Nazis, Neo-Nazis.”

This view makes sense. After all, the internet gives us a virtually unlimited number of options from which we can consume the news. I can read whatever I want. You can read whatever you want. VikingMaiden88 can read whatever she wants. And people, if left to their own devices, tend to seek out viewpoints that confirm what they believe. Thus, surely, the internet must be creating extreme political segregation.

There is one problem with this standard view. The data tells us that it is simply not true.

The evidence against this piece of conventional wisdom comes from a 2011 study by Matt Gentzkow and Jesse Shapiro, two economists whose work we discussed earlier.

Gentzkow and Shapiro collected data on the browsing behavior of a large sample of Americans. Their dataset also included the ideology—self-reported—of their subjects: whether people considered themselves more liberal or conservative. They used this data to measure the political segregation on the internet.

How? They performed an interesting thought experiment.

Suppose you randomly sampled two Americans who happen to both be visiting the same news website. What is the probability one of them will be liberal and the other conservative? How frequently, in other words, do liberals and conservatives “meet” on news sites?

To think about this further, suppose liberals and conservatives on the internet never got their online news from the same place. In other words, liberals exclusively visited liberal websites, conservatives exclusively conservative ones. If this were the case, the chances that two Americans on a given news site have opposing political views would be 0 percent. The internet would be perfectly segregated. Liberals and conservatives would never mix.

Suppose, in contrast, that liberals and conservatives did not differ at all in how they got their news. In other words, a liberal and a conservative were equally likely to visit any particular news site. If this were the case, the chances that two Americans on a given news website have opposing political views would be roughly 50 percent. The internet would be perfectly desegregated. Liberals and conservatives would perfectly mix.

So what does the data tell us? In the United States, according to Gentzkow and Shapiro, the chances that two people visiting the same news site have different political views is about 45 percent. In other words, the internet is far closer to perfect desegregation than perfect segregation. Liberals and conservatives are “meeting” each other on the web all the time.

What really puts the lack of segregation on the internet in perspective is comparing it to segregation in other parts of our lives. Gentzkow and Shapiro could repeat their analysis for various offline interactions. What are the chances that two family members have different political views? Two neighbors? Two colleagues? Two friends?

Using data from the General Social Survey, Gentzkow and Shapiro found that all these numbers were lower than the chances that two people on the same news website have different politics.

PROBABILITY THAT SOMEONE YOU MEET HAS OPPOSING POLITICAL VIEWS

On a News Website

45.2%

Coworker

41.6%

Offline Neighbor

40.3

Family Member

37%

Friend

34.7%

In other words, you are more likely to come across someone with opposing views online than you are offline.

Why isn’t the internet more segregated? There are two factors that limit political segregation on the internet.

First, somewhat surprisingly, the internet news industry is dominated by a few massive sites. We usually think of the internet as appealing to the fringes. Indeed, there are sites for everybody, no matter your viewpoints. There are landing spots for pro-gun and anti-gun crusaders, cigar rights and dollar coin activists, anarchists and white nationalists. But these sites together account for a small fraction of the internet’s news traffic. In fact, in 2009, four sites—Yahoo News, AOL News, msnbc.com, and cnn.com—collected more than half of news views. Yahoo News remains the most popular news site among Americans, with close to 90 million unique monthly visitors—or some 600 times Stormfront’s audience. Mass media sites like Yahoo News appeal to a broad, politically diverse audience.

The second reason the internet isn’t all that segregated is that many people with strong political opinions visit sites of the opposite viewpoint, if only to get angry and argue. Political junkies do not limit themselves only to sites geared toward them. Someone who visits thinkprogress.org and moveon.org—two extremely liberal sites—is more likely than the average internet user to visit foxnews.com, a right-leaning site. Someone who visits rushlimbaugh.com or glennbeck.com—two extremely conservative sites—is more likely than the average internet user to visit nytimes.com, a more liberal site.

Gentzkow and Shapiro’s study was based on data from 2004–09, relatively early in the history of the internet. Might the internet have grown more compartmentalized since then? Have social media and, in particular, Facebook altered their conclusion? Clearly, if our friends tend to share our political views, the rise of social media should mean a rise of echo chambers. Right?

Again, the story is not so simple. While it is true that people’s friends on Facebook are more likely than not to share their political views, a team of data scientists—Eytan Bakshy, Solomon Messing, and Lada Adamic—have found that a surprising amount of the information people get on Facebook comes from people with opposing views.

How can this be? Don’t our friends tend to share our political views? Indeed, they do. But there is one crucial reason that Facebook may lead to a more diverse political discussion than offline socializing. People, on average, have substantially more friends on Facebook than they do offline. And these weak ties facilitated by Facebook are more likely to be people with opposite political views.

In other words, Facebook exposes us to weak social connections—the high school acquaintance, the crazy third cousin, the friend of the friend of the friend you sort of, kind of, maybe know. These are people you might never go bowling with or to a barbecue with. You might not invite them over to a dinner party. But you do Facebook friend them. And you do see their links to articles with views you might have never otherwise considered.

In sum, the internet actually brings people of different political views together. The average liberal may spend her morning with her liberal husband and liberal kids; her afternoon with her liberal coworkers; her commute surrounded by liberal bumper stickers; her evening with her liberal yoga classmates. When she comes home and peruses a few conservative comments on cnn.com or gets a Facebook link from her Republican high school acquaintance, this may be her highest conservative exposure of the day.

I probably never encounter white nationalists in my favorite coffee shop in Brooklyn. But VikingMaiden88 and I both frequent the New York Times site.

THE TRUTH ABOUT CHILD ABUSE AND ABORTION

The internet can give us insights into not just disturbing attitudes but also disturbing behaviors. Indeed, Google data may be effective at alerting us to crises that are missed by all the usual sources. People, after all, turn to Google when they are in trouble.

Consider child abuse during the Great Recession.

When this major economic downturn started in late 2007, many experts were naturally worried about the effect it might have on children. After all, many parents would be stressed and depressed, and these are major risk factors for maltreatment. Child abuse might skyrocket.

Then the official data came in, and it seemed that the worry was unfounded. Child protective service agencies reported that they were getting fewer cases of abuse. Further, these drops were largest in states that were hardest hit by the recession. “The doom-and-gloom predictions haven’t come true,” Richard Gelles, a child welfare expert at the University of Pennsylvania, told the Associated Press in 2011. Yes, as counterintuitive as it may have seemed, child abuse seemed to have plummeted during the recession.

But did child abuse really drop with so many adults out of work and extremely distressed? I had trouble believing this. So I turned to Google data.

It turns out, some kids make some tragic, and heart-wrenching, searches on Google—such as “my mom beat me” or “my dad hit me.” And these searches present a different—and agonizing—picture of what happened during this time. The number of searches like this shot up during the Great Recession, closely tracking the unemployment rate.

Here’s what I think happened: it was the reporting of child abuse cases that declined, not the child abuse itself. After all, it is estimated that only a small percentage of child abuse cases are reported to authorities anyway. And during a recession, many of the people who tend to report child abuse cases (teachers and police officers, for example) and handle cases (child protective service workers) are more likely to be overworked or out of work.

There were many stories during the economic downturn of people trying to report potential cases facing long wait times and giving up.

Indeed, there is more evidence, this time not from Google, that child abuse actually rose during the recession. When a child dies due to abuse or neglect it has to be reported. Such deaths, although rare, did rise in states that were hardest hit by the recession.

And there is some evidence from Google that more people were suspecting abuse in hard-hit areas. Controlling for pre-recession rates and national trends, states that had comparatively suffered the most had increased search rates for child abuse and neglect. For every percentage point increase in the unemployment rate, there was an associated 3 percent increase in the search rate for “child abuse” or “child neglect.” Presumably, most of these people never successfully reported the abuse, as these states had the biggest drops in the reporting.

Searches by suffering kids increase. The rate of child deaths spike. Searches by people suspecting abuse go up in hard-hit states. But reporting of cases goes down. A recession seems to cause more kids to tell Google that their parents are hitting or beating them and more people to suspect that they see abuse. But the overworked agencies are able to handle fewer cases.

I think it’s safe to say that the Great Recession did make child abuse worse, although the traditional measures did not show it.

Anytime I suspect people may be suffering off the books now, I turn to Google data. One of the potential benefits of this new data, and knowing how to interpret it, is the possibility of helping vulnerable people who might otherwise go overlooked by authorities.

So when the Supreme Court was recently looking into the effects of laws making it more difficult to get an abortion, I turned to the query data. I suspected women affected by this legislation might look into off-the-books ways to terminate a pregnancy. They did. And these searches were highest in states that had passed laws restricting abortions.

The search data here is both useful and troubling.

In 2015, in the United States, there were more than 700,000 Google searches looking into self-induced abortions. By comparison, there were some 3.4 million searches for abortion clinics that year. This suggests that a significant percentage of women considering an abortion have contemplated doing it themselves.

Women searched, about 160,000 times, for ways of getting abortion pills through unofficial channels—“buy abortion pills online” and “free abortion pills.” They asked Google about abortion by herbs like parsley or by vitamin C. There were some 4,000 searches looking for directions on coat hanger abortions, including about 1,300 for the exact phrase “how to do a coat hanger abortion.” There were also a few hundred looking into abortion through bleaching one’s uterus and punching one’s stomach.

What drives interest in self-induced abortion? The geography and timing of the Google searches point to a likely culprit: when it’s hard to get an official abortion, women look into off-the-books approaches.

Search rates for self-induced abortion were fairly steady from 2004 through 2007. They began to rise in late 2008, coinciding with the financial crisis and the recession that followed. They took a big leap in 2011, jumping 40 percent. The Guttmacher Institute, a reproductive rights organization, singles out 2011 as the beginning of the country’s recent crackdown on abortion; ninety-two state provisions that restrict access to abortion were enacted. Looking by comparison at Canada, which has not seen a crackdown on reproductive rights, there was no comparable increase in searches for self-induced abortions during this time.

The state with the highest rate of Google searches for self-induced abortions is Mississippi, a state with roughly three million people and, now, just one abortion clinic. Eight of the ten states with the highest search rates for self-induced abortions are considered by the Guttmacher Institute to be hostile or very hostile to abortion. None of the ten states with the lowest search rates for self-induced abortion are in either category.

Of course, we cannot know from Google searches how many women successfully give themselves abortions, but evidence suggests that a significant number may. One way to illuminate this is to compare abortion and birth data.

In 2011, the last year with complete state-level abortion data, women living in states with few abortion clinics had many fewer legal abortions.

Compare the ten states with the most abortion clinics per capita (a list that includes New York and California) to the ten states with the fewest abortion clinics per capita (a list that includes Mississippi and Oklahoma). Women living in states with the fewest abortion clinics had 54 percent fewer legal abortions—a difference of eleven abortions for every thousand women between the ages of fifteen and forty-four. Women living in states with the fewest abortion clinics also had more live births. However, the difference was not enough to make up for the lower number of abortions. There were six more live births for every thousand women of childbearing age.

In other words, there appear to have been some missing pregnancies in parts of the country where it was hardest to get an abortion. The official sources don’t tell us what happened to those five missing births for each thousand women in states where it is hard to get an abortion.

Google provides some pretty good clues.

We can’t blindly trust government data. The government may tell us that child abuse or abortion has fallen and politicians may celebrate this achievement. But the results we think we’re seeing may be an artifact of flaws in the methods of data collection. The truth may be different—and, sometimes, far darker.

THE TRUTH ABOUT YOUR FACEBOOK FRIENDS

This book is about Big Data, in general. But this chapter has mostly emphasized Google searches, which I have argued reveal a hidden world very different from the one we think we see. So are other Big Data sources digital truth serum, as well? The fact is, many Big Data sources, such as Facebook, are often the opposite of digital truth serum.

On social media, as in surveys, you have no incentive to tell the truth. On social media, much more so than in surveys, you have a large incentive to make yourself look good. Your online presence is not anonymous, after all. You are courting an audience and telling your friends, family members, colleagues, acquaintances, and strangers who you are.

To see how biased data pulled from social media can be, consider the relative popularity of the Atlantic, a respected, highbrow monthly magazine, versus the National Enquirer, a gossipy, often-sensational magazine. Both publications have similar average circulations, selling a few hundred thousand copies. (The National Enquirer is a weekly, so it actually sells more total copies.) There are also a comparable number of Google searches for each magazine.

However, on Facebook, roughly 1.5 million people either like the Atlantic or discuss articles from the Atlantic on their profiles. Only about 50,000 like the Enquirer or discuss its contents.

ATLANTIC VS. NATIONAL ENQUIRER POPULARITY COMPARED BY DIFFERENT SOURCES

Circulation

Roughly 1 Atlantic for every 1 National Enquirer

Google Searches

1 Atlantic for every 1 National Enquirer

Facebook Likes

27 Atlantic for every 1 National Enquirer

For assessing magazine popularity, circulation data is the ground truth. Google data comes close to matching it. And Facebook data is overwhelmingly biased against the trashy tabloid, making it the worst data for determining what people really like.

And as with reading preferences, so with life. On Facebook, we show our cultivated selves, not our true selves. I use Facebook data in this book, in fact in this chapter, but always with this caveat in mind.

To gain a better understanding of what social media misses, let’s return to pornography for a moment. First, we need to address the common belief that the internet is dominated by smut. This isn’t true. The majority of content on the internet is nonpornographic. For instance, of the top ten most visited websites, not one is pornographic. So the popularity of porn, while enormous, should not be overstated.

Yet, that said, taking a close look at how we like and share pornography makes it clear that Facebook, Instagram, and Twitter only provide a limited window into what’s truly popular on the internet. There are large subsets of the web that operate with massive popularity but little social presence.

The most popular video of all time, as of this writing, is Psy’s “Gangnam Style,” a goofy pop music video that satirizes trendy Koreans. It’s been viewed about 2.3 billion times on YouTube alone since its debut in 2012. And its popularity is clear no matter what site you are on. It’s been shared across different social media platforms tens of millions of times.

The most popular pornographic video of all time may be “Great Body, Great Sex, Great Blowjob.” It’s been viewed more than 80 million times. In other words, for every thirty views of “Gangnam Style,” there has been about at least one view of “Great Body, Great Sex, Great Blowjob.” If social media gave us an accurate view of the videos people watched, “Great Body, Great Sex, Great Blowjob” should be posted millions of times. But this video has been shared on social media only a few dozen times and always by porn stars, not by average users. People clearly do not feel the need to advertise their interest in this video to their friends.

Facebook is digital brag-to-my-friends-about-how-good-my-life-is serum. In Facebook world, the average adult seems to be happily married, vacationing in the Caribbean, and perusing the Atlantic. In the real world, a lot of people are angry, on supermarket checkout lines, peeking at the National Enquirer, ignoring the phone calls from their spouse, whom they haven’t slept with in years. In Facebook world, family life seems perfect. In the real world, family life is messy. It can occasionally be so messy that a small number of people even regret having children. In Facebook world, it seems every young adult is at a cool party Saturday night. In the real world, most are home alone, binge-watching shows on Netflix. In Facebook world, a girlfriend posts twenty-six happy pictures from her getaway with her boyfriend. In the real world, immediately after posting this, she Googles “my boyfriend won’t have sex with me.” And, perhaps at the same time, the boyfriend watches “Great Body, Great Sex, Great Blowjob.”

DIGITAL TRUTH

DIGITAL LIES

•  Searches

•  Social media posts

•  Views

•  Social media likes

•  Clicks

•  Dating profiles

•  Swipes

THE TRUTH ABOUT YOUR CUSTOMERS

In the early morning of September 5, 2006, Facebook introduced a major update to its home page. The early versions of Facebook had only allowed users to click on profiles of their friends to learn what they were doing. The website, considered a big success, had at the time 9.4 million users.

But after months of hard work, engineers had created something they called “News Feed,” which would provide users with updates on the activities of all their friends.

Users immediately reported that they hated News Feed. Ben Parr, a Northwestern undergraduate, created “Students Against Facebook news feed.” He said that “news feed is just too creepy, too stalker-esque, and a feature that has to go.” Within a few days, the group had 700,000 members echoing Parr’s sentiment. One University of Michigan junior told the Michigan Daily, “I’m really creeped out by the new Facebook. It makes me feel like a stalker.”

David Kirkpatrick tells this story in his authorized account of the website’s history, The Facebook Effect: The Inside Story of the Company That Is Connecting the World. He dubs the introduction of News Feed “the biggest crisis Facebook has ever faced.” But Kirkpatrick reports that when he interviewed Mark Zuckerberg, cofounder and head of the rapidly growing company, the CEO was unfazed.

The reason? Zuckerberg had access to digital truth serum: numbers on people’s clicks and visits to Facebook. As Kirkpatrick writes:

Zuckerberg in fact knew that people liked the News Feed, no matter what they were saying in the groups. He had the data to prove it. People were spending more time on Facebook, on average, than before News Feed launched. And they were doing more there—dramatically more. In August, users viewed 12 billion pages on the service. But by October, with News Feed under way, they viewed 22 billion.

And that was not all the evidence at Zuckerberg’s disposal. Even the viral popularity of the anti–News Feed group was evidence of the power of News Feed. The group was able to grow so rapidly precisely because so many people had heard that their friends had joined—and they learned this through their News Feed.

In other words, while people were joining in a big public uproar over how unhappy they were about seeing all the details of their friends’ lives on Facebook, they were coming back to Facebook to see all the details of their friends’ lives. News Feed stayed. Facebook now has more than one billion daily active users.

In his book Zero to One, Peter Thiel, an early investor in Facebook, says that great businesses are built on secrets, either secrets about nature or secrets about people. Jeff Seder, as discussed in Chapter 3, found the natural secret that left ventricle size predicted horse performance. Google found the natural secret of how powerful the information in links can be.

Thiel defines “secrets about people” as “things that people don’t know about themselves or things they hide because they don’t want others to know.” These kinds of businesses, in other words, are built on people’s lies.

You could argue that all of Facebook is founded on an unpleasant secret about people that Zuckerberg learned while at Harvard. Zuckerberg, early in his sophomore year, created a website for his fellow students called Facemash. Modeled on a site called “Am I Hot or Not?,” Facemash would present pictures of two Harvard students and then have other students judge who was better looking.

The sophomore’s site was greeted with outrage. The Harvard Crimson, in an editorial, accused young Zuckerberg of “catering to the worst side” of people. Hispanic and African-American groups accused him of sexism and racism. Yet, before Harvard administrators shut down Zuckerberg’s internet access—just a few hours after the site was founded—450 people had viewed the site and voted 22,000 times on different images. Zuckerberg had learned an important secret: people can claim they’re furious, they can decry something as distasteful, and yet they’ll still click.

And he learned one more thing: for all their professions of seriousness, responsibility, and respect for others’ privacy, people, even Harvard students, had a great interest in evaluating people’s looks. The views and votes told him that. And later—since Facemash proved too controversial—he took this knowledge of just how interested people could be in superficial facts about others they sort of knew and harnessed it into the most successful company of his generation.

Netflix learned a similar lesson early on in its life cycle: don’t trust what people tell you; trust what they do.

Originally, the company allowed users to create a queue of movies they wanted to watch in the future but didn’t have time for at the moment. This way, when they had more time, Netflix could remind them of those movies.

However, Netflix noticed something odd in the data. Users were filling their queues with plenty of movies. But days later, when they were reminded of the movies on the queue, they rarely clicked.

What was the problem? Ask users what movies they plan to watch in a few days, and they will fill the queue with aspirational, highbrow films, such as black-and-white World War II documentaries or serious foreign films. A few days later, however, they will want to watch the same movies they usually want to watch: lowbrow comedies or romance films. People were consistently lying to themselves.

Faced with this disparity, Netflix stopped asking people to tell them what they wanted to see in the future and started building a model based on millions of clicks and views from similar customers. The company began greeting its users with suggested lists of films based not on what they claimed to like but on what the data said they were likely to view. The result: customers visited Netflix more frequently and watched more movies.

“The algorithms know you better than you know yourself,” says Xavier Amatriain, a former data scientist at Netflix.

阅读 ‧ 电子书库

CAN WE HANDLE THE TRUTH?

You may find parts of this chapter depressing. Digital truth serum has revealed an abiding interest in judging people based on their looks; the continued existence of millions of closeted gay men; a meaningful percentage of women fantasizing about rape; widespread animus against African-Americans; a hidden child abuse and self-induced abortion crisis; and an outbreak of violent Islamophobic rage that only got worse when the president appealed for tolerance. Not exactly cheery stuff. Often, after I give a talk on my research, people come up to me and say, “Seth, it’s all very interesting. But it’s so depressing.”

I can’t pretend there isn’t a darkness in some of this data. If people consistently tell us what they think we want to hear, we will generally be told things that are more comforting than the truth. Digital truth serum, on average, will show us that the world is worse than we have thought.

Do we need to know this? Learning about Google searches, porn data, and who clicks on what might not make you think, “This is great. We can understand who we really are.” You might instead think, “This is horrible. We can understand who we really are.”

But the truth helps—and not just for Mark Zuckerberg or others looking to attract clicks or customers. There are at least three ways that this knowledge can improve our lives.

First, there can be comfort in knowing that you are not alone in your insecurities and embarrassing behavior. It can be nice to know others are insecure about their bodies. It is probably nice for many people—particularly those who aren’t having much sex—to know the whole world isn’t fornicating like rabbits. And it may be valuable for a high school boy in Mississippi with a crush on the quarterback to know that, despite the low numbers of openly gay men around him, plenty of others feel the same kinds of attraction.

There’s another area—one I haven’t yet discussed—where Google searches can help show you are not alone. When you were young, a teacher may have told you that, if you have a question, you should raise your hand and ask it because if you’re confused, others are, too. If you were anything like me, you ignored your teacher’s advice and sat there silently, afraid to open your mouth. Your questions were too dumb, you thought; everyone else’s were more profound. The anonymous, aggregate Google data can tell us once and for all how right our teachers were. Plenty of basic, sub-profound questions lurk in other minds, too.

Consider the top questions Americans had during Obama’s 2014 State of the Union speech. (See the color photo at end of the book.)

YOU’RE NOT THE ONLY ONE WONDERING: TOP GOOGLED QUESTIONS DURING THE STATE OF THE UNION

How old is Obama?

Who is sitting next to Biden?

Why is Boehner wearing a green tie?

Why is Boehner orange?

Now, you might read these questions and think they speak poorly of our democracy. To be more concerned about the color of someone’s tie or his skin tone instead of the content of the president’s speech doesn’t reflect well on us. To not know who John Boehner, then the Speaker of the House of Representatives, is also doesn’t say much for our political engagement.

I prefer instead to think of such questions as demonstrating the wisdom of our teachers. These are the types of questions people usually don’t raise, because they sound too silly. But lots of people have them—and Google them.

In fact, I think Big Data can give a twenty-first-century update to a famous self-help quote: “Never compare your insides to everyone else’s outsides.”

A Big Data update may be: “Never compare your Google searches to everyone else’s social media posts.”

Compare, for example, the way that people describe their husbands on public social media and in anonymous searches.

TOP WAYS PEOPLE DESCRIBE THEIR HUSBANDS

SOCIAL MEDIA POSTS

SEARCHES

the best

gay

my best friend

a jerk

amazing

amazing

the greatest

annoying

so cute

mean

Since we see other people’s social media posts but not their searches, we tend to exaggerate how many women consistently think their husbands are “the best,” “the greatest,” and “so cute.”* We tend to minimize how many women think their husbands are “a jerk,” “annoying,” and “mean.” By analyzing anonymous and aggregate data, we may all understand that we’re not the only ones who find marriage, and life, difficult. We may learn to stop comparing our searches to everyone else’s social media posts.

The second benefit of digital truth serum is that it alerts us to people who are suffering. The Human Rights Campaign has asked me to work with them in helping educate men in certain states about the possibility of coming out of the closet. They are looking to use the anonymous and aggregate Google search data to help them decide where best to target their resources. Similarly, child protective service agencies have contacted me to learn in what parts of the country there may be far more child abuse than they are recording.

One surprising topic I was also contacted about: vaginal odors. When I first wrote about this in the New York Times, of all places, I did so in an ironic tone. The section made me, and others, chuckle.

However, when I later explored some of the message boards that come up when someone makes these searches they included numerous posts from young girls convinced that their lives were ruined due to anxiety about vaginal odor. It’s no joke. Sex ed experts have contacted me, asking how they can best incorporate some of the internet data to reduce the paranoia among young girls.

While I feel a bit out of my depth on all these matters, they are serious, and I believe data science can help.

The final—and, I think, most powerful—value in this digital truth serum is indeed its ability to lead us from problems to solutions. With more understanding, we might find ways to reduce the world’s supply of nasty attitudes.

Let’s return to Obama’s speech about Islamophobia. Recall that every time Obama argued that people should respect Muslims more, the very people he was trying to reach became more enraged.

Google searches, however, reveal that there was one line that did trigger the type of response then-president Obama might have wanted. He said, “Muslim Americans are our friends and our neighbors, our co-workers, our sports heroes and, yes, they are our men and women in uniform, who are willing to die in defense of our country.”

After this line, for the first time in more than a year, the top Googled noun after “Muslim” was not “terrorists,” “extremists,” or “refugees.” It was “athletes,” followed by “soldiers.” And, in fact, “athletes” kept the top spot for a full day afterward.

When we lecture angry people, the search data implies that their fury can grow. But subtly provoking people’s curiosity, giving new information, and offering new images of the group that is stoking their rage may turn their thoughts in different, more positive directions.

Two months after that original speech, Obama gave another televised speech on Islamophobia, this time at a mosque. Perhaps someone in the president’s office had read Soltas’s and my Times column, which discussed what had worked and what didn’t. For the content of this speech was noticeably different.

Obama spent little time insisting on the value of tolerance. Instead, he focused overwhelmingly on provoking people’s curiosity and changing their perceptions of Muslim Americans. Many of the slaves from Africa were Muslim, Obama told us; Thomas Jefferson and John Adams had their own copies of the Koran; the first mosque on U.S. soil was in North Dakota; a Muslim American designed skyscrapers in Chicago. Obama again spoke of Muslim athletes and armed service members but also talked of Muslim police officers and firefighters, teachers and doctors.

And my analysis of the Google searches suggests this speech was more successful than the previous one. Many of the hateful, rageful searches against Muslims dropped in the hours after the president’s address.

There are other potential ways to use search data to learn what causes, or reduces, hate. For example, we might look at how racist searches change after a black quarterback is drafted in a city or how sexist searches change after a woman is elected to office. We might see how racism responds to community policing or how sexism responds to new sexual harassment laws.

Learning of our subconscious prejudices can also be useful. For example, we might all make an extra effort to delight in little girls’ minds and show less concern with their appearance. Google search data and other wellsprings of truth on the internet give us an unprecedented look into the darkest corners of the human psyche. This is at times, I admit, difficult to face. But it can also be empowering. We can use the data to fight the darkness. Collecting rich data on the world’s problems is the first step toward fixing them.