INDEX

The pagination of this electronic edition does not match the edition from which it was created. To locate a specific entry, please use your e-book reader’s search tools.

A/B testing

广告:个人专属 VPN,独立 IP,无限流量,多机房切换,还可以屏蔽广告和恶意软件,每月最低仅 5 美元

ABCs of, 209–21

and addictions, 219–20

and Boston Globe headlines, 214–17

in digital world, 210–19

downside to, 219–21

and education/learning, 276

and Facebook, 211

future uses of, 276, 277, 278

and gaming industry, 220–21

and Google advertising, 217–19

importance of, 214, 217

and Jawbone, 277

and politics, 211–14

and television, 222

Abdulkadiroglu, Atila, 235–36

abortion, truth about, 147–50

Adamic, Lada, 144

Adams, John, 78

addictions

and A/B testing, 219–20

See also specific addiction

advertising

and A/B testing, 217–19

causal effects of, 221–25, 273

and examples of Big Data searches, 22

Google, 217–19

and Levitt-electronics company, 222, 225, 226

and movies, 224–25

and science, 273

and Super Bowl games, 221–26

TV, 221–26

African Americans

and Harvard Crimson editorial about Zuckerberg, 155

income and, 175

and origins of notable Americans, 182–83

and truth about hate and prejudice, 129, 134

See also “nigger”; race/racism

age

and baseball fans, 165–69, 165–66n

and lying, 108n

and origins of political preferences, 169–71

and predicting future of baseball players, 198–99

of Stormfront members, 137–38

and words as data, 85–86

See also children; teenagers

Aiden, Erez, 76–77, 78–79

alcohol

as addiction, 219

and health, 207–8

AltaVista (search engine), 60

Alter, Adam, 219–20

Amatriain, Xavier, 157

Amazon, 20, 203, 283

American Pharoah (Horse No. 85), 22, 64, 65, 70–71, 256

Angrist, Joshua, 235–36

anti-Semitism. See Jews

anxiety

data about, 18

and truth about sex, 123

AOL, and truth about sex, 117–18

AOL News, 143

art, real life as imitating, 190–97

Ashenfelter, Orley, 72–74

Asher, Sam, 202

Asians, and truth about hate and prejudice, 129

asking the right questions, 21–22

assassinations, 227–28

Atlantic magazine, 150–51, 152, 202

Australia, pregnancy in, 189

auto-complete, 110–11, 116

Avatar (movie), 221–22

Bakshy, Eytan, 144

Baltimore Ravens-New England Patriots games, 221, 222–24

baseball

and influence of childhood experiences, 165–69, 165–66n, 171, 206

and overemphasis on measurability, 254–55

predicting a player’s future in, 197–200, 200n, 203

and science, 273

scouting for, 254–55

zooming in on, 165–69, 165–66n, 171, 197–200, 200n, 203

basketball

pedigrees and, 67

predicting success in, 33–41, 67

and socioeconomic background, 34–41

Beane, Billy, 255

Beethoven, Ludwig von, zooming in on, 190–91

behavioral science, and digital revolution, 276, 279

Belushi, John, 185

Benson, Clark, 217

Berger, Jonah, 91–92

Bezos, Jeff, 203

bias

implicit, 134

language as key to understanding, 74–76

omitted-variable, 208

subconscious, 132

See also hate; prejudice; race/racism

Big Data

and amount of information, 15, 21, 59, 171

and asking the right questions, 21–22

and causality experiments, 54, 240

definition of, 14, 15

and dimensionality, 246–52

and examples of searches, 15–16

and expansion of research methodology, 275–76

and finishing books, 283–84

future of, 279

Google searches as dominant source of, 60

honesty of, 53–54

importance/value of, 17–18, 29–33, 59, 240, 265, 283

limitations of, 20, 245, 254–55, 256

powers of, 15, 17, 22, 53–54, 59, 109, 171, 211, 257

and predicting what people will do in future, 198–200

as revolutionary, 17, 18–22, 30, 62, 76, 256, 274

as right data, 62

skeptics of, 17

and small data, 255–56

subsets in, 54

understanding of, 27–28

See also specific topic

Bill & Melinda Gates Foundation, 255

Billings (Montana) Gazette, and words as data, 95

Bing (search engine), and Columbia University-Microsoft pancreatic cancer study, 28, 30

Black, Don, 137

Black Lives Matter, 12

Blink (Gladwell), 29–30

Bloodstock, Incardo, 64

bodies, as data, 62–74

Boehner, John, 160

Booking.com, 265

books

conclusions to, 271–72, 279, 280–84

digitalizing, 77, 79

number of people who finish, 283–84

borrowing money, 257–61

Bosh, Chris, 37

Boston Globe, and A/B testing, 214–17

Boston Marathon (2013), 19

Boston Red Sox, 197–200

brain, Minsky study of, 273

Brazil, pregnancy in, 190

breasts, and truth about sex, 125, 126

Brin, Sergey, 60, 61, 62, 103

Britain, pregnancy in, 189

Bronx Science High School (New York City), 232, 237

Buffett, Warren, 239

Bullock, Sandra, 185

Bundy, Ted, 181

Bush, George W., 67

business

and comparison shopping, 265

reviews of, 265

See also corporations

butt, and truth about sex, 125–26

Calhoun, Jim, 39

Cambridge University, and Microsoft study about IQ of Facebook users, 261

cancer, predicting pancreatic, 28–29, 30

Capital in the 21st Century (Piketty), 283

casinos, and price discrimination, 263–65

causality

A/B testing and, 209–21

and advertising, 221–25

and Big Data experiments, 54, 240

college and, 237–39

correlation distinguished from, 221–25

and ethics, 226

and monetary windfalls, 229

natural experiments and, 226–28

and power of Big Data, 54, 211

and randomized controlled experiments, 208–9

reverse, 208

and Stuyvesant High School study, 231–37, 240

Centers for Disease Control and Prevention, 57

Chabris, Christopher, 250

Chance, Zoë, 252–53

Chaplin, Charlie, 19

charitable giving, 106, 109

Chen, M. Keith, 235

Chetty, Raj, 172–73, 174–75, 176, 177, 178–80, 185, 273

children

abuse of, 145–47, 149–50, 161

and benefits of digital truth serum, 161

and child pornography, 121

decisions about having, 111–12

height and weight data about, 204–5

of immigrants, 184–85

and income distribution, 176

and influence of childhood experiences, 165–71, 165–66n, 206

intelligence of, 135

and origins of notable Americans, 184–85

parent prejudices against, 134–36, 135n

physical appearance of, 135–36

See also parents/parenting; teenagers

cholera, Snow study about, 275

Christians, and truth about hate and prejudice, 129

Churchill, Winston, 169

cigarette economy, Philippines, 102

cities

and danger of empowered government, 267, 268–69

predicting behavior of, 268–69

zooming in on, 172–90, 239–40

Civil War, 79

Clemens, Jeffrey, 230

Clinton, Bill, searches for, 60–62

Clinton, Hillary. See elections, 2016

A Clockwork Orange (movie), 190–91

cnn.com, 143, 145

Cohen, Leonard, 82n

college

and causality, 237–39

and examples of Big Data searches, 22

college towns, and origins of notable Americans, 182–83, 184, 186

Colors (movie), 191

Columbia University, Microsoft pancreatic cancer study and, 28–29, 30

comparison shopping, 265

conclusions

benefits of great, 281–84

to books, 271–72, 279, 280–84

characteristics of best, 272, 274–79

importance of, 283

as pointing way to more things to come, 274–79

purpose of, 279–80

Stephens-Davidowitz’s writing of, 271–72, 281–84

condoms, 5, 122

Congressional Record, and Gentzkow-Shapiro research, 93

conservatives

and origins of political preferences, 169–71

and parents prejudice against children, 136

and truth about the internet, 140, 141–44, 145

and words as data, 75–76, 93, 95–96

consumers. See customers/consumers

contagious behavior, 178

conversation, and dating, 80–82

corporations

consumers blows against, 265

danger of empowered, 257–65

reviews of, 265

correlations

causation distinguished from, 221–25

and predicting the stock market, 245–48, 251–52

counties, zooming in on, 172–90, 239–40

Country Music Radio, 202

Craigslist, 117

creativity, and understanding the world, 280, 281

crime

alcohol as contributor to, 196

and danger of empowered government, 266–70

and prison conditions, 235

violent movies and, 193, 194–95, 273

Cundiff, Billy, 223

curiosity

and benefits of digital truth serum, 162, 163

Levitt views about, 280

about number of people who finish books, 283–84

and understanding the world, 280, 281

cursing, and words as data, 83–85

customers/consumers

blows against businesses by, 265

and price discrimination, 265

truth about, 153–57

Cutler, David, 178

Dahl, Gordon, 191–93, 194–96, 196–97n, 197

Dale, Stacy, 238

Dallas, Texas, “Large and Complex Datasets” conference (1977) in, 20–21

data

amount/size of, 15, 20–21, 30–31, 53, 171

benefits of expansion of, 16

bodies as, 62–74

collecting the right, 62

government, 149–50, 266–70

importance of, 26

individual-level, 266–70

as intimidating, 26

Levitt views about, 280

as money-maker, 103

nontraditional sources of, 74

pictures as, 97–102, 103

reimagining of what qualifies as, 55–103

sources of, 14, 15

speed for transmitting, 55–59

and understanding the world, 280

what counts as, 74

words as, 74–97

See also Big Data; data science; small data; specific data

data science

as changing view of world, 34

and counterintuitive results, 37–38

economists role in development of, 228

future of, 281

goal of, 37–38

as intuitive, 26–33

and who is a data scientist, 27

dating

and examples of Big Data searches, 22

physical appearance and, 82, 120n

and rejection, 120n

and Stormfront members, 138–39

and truth about hate and prejudice, 138–39

and truth about sex, 120n

and words as data, 80–86, 103

Dawn of the Dead (movie), 192

death, and memorable stories, 33

DellaVigna, Stefano, 191–93, 194–96, 196–97n, 197

Democrats

core principles of, 94

and origins of political preferences, 170–71

and words as data, 93–97

See also specific person or election

depression

Google searchs for, 31, 110

and handling the truth, 158

and lying, 109, 110

and parents prejudice against children, 136

developing countries

economies of, 101–2, 103

investing in, 251

digital truth serum

abortion and, 147–50

and child abuse, 145–47, 149–50

and customers, 153–57

and Facebook friends, 150–53

and handling the truth, 158–63

and hate and prejudice, 128–40

and ignoring what people tell you, 153–57

incentives and, 109

and internet, 140–45

sex and, 112–28

sites as, 54

See also lying; truth

digital world, randomized experiments in, 210–19

dimensionality, curse of, 246–52

discrimination

and origins of notable Americans, 182–83

price, 262–65

See also bias; prejudice; race/racism

DNA, 248–50

Dna88 (Stormfront member), 138

doctors, financial incentives for, 230, 240

Donato, Adriana, 266, 269

doppelgangers

benefits of, 263

and health, 203–5

and hunting on social media, 201–3

and predicting future of baseball players, 197–200, 200n, 203

and price discrimination, 262–63, 264

zooming in on, 197–205

dreams, phallic symbols in, 46–48

drugs, as addiction, 219

Duflo, Esther, 208–9, 210, 273

Earned Income Tax Credit, 178, 179

economists

and number of people finishing books, 283

role in data science development of, 228

as soft scientists, 273

See also specific person

economy/economics

complexity of, 273

of developing countries, 101–2, 103

of Philippines cigarette economy, 102

and pictures as data, 99–102

and speed of data, 56–57

and truth about hate and prejudice, 139

See also economists; specific topic

Edmonton, water consumption in, 206

EDU STAR, 276

education

and A/B testing, 276

and digital revolution, 279

and overemphasis on measurability, 253–54, 255–56

in rural India, 209, 210

small data in, 255–56

state spending on, 185

and using online behavior as supplement to testing, 278

See also high school students; tests/testing

Eisenhower, Dwight D., 170–71

elections

and order of searches, 10–11

predictions about, 9–14

voter turn out in, 9–10

elections, 2008

and A/B testing, 211–12

racism in, 2, 6–7, 12, 133, 134

and Stormfront membership, 139

elections, 2012

and A/B testing, 211–12

predictions about, 10

racism in, 2–3, 8, 133, 134

Trump and, 7

elections, 2016

and lying, 107

mapping of, 12–13

polls about, 1

predicting outcome of, 10–14

and racism, 8, 11, 12, 14, 133

Republican primaries for, 1, 13–14, 133

and Stormfront membership, 139

voter turn out in, 11

electronics company, and advertising, 222, 225, 226

“Elite Illusion” (Abdulkadiroglu, Angrist, and Pathak), 236

Ellenberg, Jordan, 283

Ellerbee, William, 34

Eng, Jessica, 236–37

environment, and life expectancy, 177

EPCOR utility company, 193, 194

EQB, 63–64

equality of opportunity, zooming in on, 173–75

Error Bot, 48–49

ethics

and Big Data, 257–65

and danger of empowered government, 267

doppelganger searches and, 262–63

empowered corporations and, 257–65

and experiments, 226

hiring practices and, 261–62

and IQDNA study results, 249

and paying back loans, 257–61

and price discrimination, 262–65

and study of IQ of Facebook users, 261

Ewing, Patrick, 33

experiments

and ethics, 226

and real science, 272–73

See also type of experiment or specific experiment

Facebook

and A/B testing, 211

and addictions, 219, 220

and hiring practices, 261

and ignoring what people tell you, 153–55, 157

and influence of childhood experiences data, 166–68, 171

IQ of users of, 261

Microsoft-Cambridge University study of users of, 261

“News Feed” of, 153–55, 255

and overemphasis on measurability, 254, 255

and pictures as data, 99

and “secrets about people,” 155–56

and size of Big Data, 20

and small data, 255

as source of information, 14, 32

and truth about customers, 153–55

truth about friends on, 150–53

and truth about sex, 113–14, 116

and truth about the internet, 144, 145

and words as data, 83, 85, 87–88

The Facebook Effect: The Inside Story of the Company That Is Connecting the World (Kirkpatrick), 154

Facemash, 156

faces

black, 133

and pictures as data, 98–99

and truth about hate and prejudice, 133

Farook, Rizwan, 129–30

Father’s Day advertising, 222, 225

50 Shades of Gray, 157

financial incentives, for doctors, 230, 240

First Law of Viticulture, 73–74

food

and phallic symbols in dreams, 46–48

predictions about, 71–72

and pregnancy, 189–90

football

and advertising, 221–25

zooming in on, 196–97n

Freakonomics (Levitt), 265, 280, 281

Freud, Sigmund, 22, 45–52, 272, 281

Friedman, Jerry, 20, 21

Fryer, Roland, 36

Gabriel, Stuart, 9–10, 11

Gallup polls, 2, 88, 113

gambling/gaming industry, 220–21, 263–65

“Gangnam Style” video, Psy, 152

Garland, Judy, 114, 114n

Gates, Bill, 209, 238–39

gays

in closet, 114–15, 116, 117, 118–19, 161

and dimensions of sexuality, 279

and examples of Big Data searches, 22

and handling the truth, 159, 161

in Iran, 119

and marriage, 74–76, 93, 115–16, 117

mobility of, 113–14, 115

population of, 115, 116, 240

and pornography, 114–15, 114n, 116, 117, 119

in Russia, 119

stereotype of, 114n

surveys about, 113

teenagers as, 114, 116

and truth about hate and prejudice, 129

and truth about sex, 112–19

and wives suspicions of husbands, 116–17

women as, 116

and words as data, 74–76, 93

Gelles, Richard, 145

Gelman, Andrew, 169–70

gender

and life expectancy, 176

and parents prejudice against children, 134–36, 135n

of Stormfront members, 137

See also gays

General Social Survey, 5, 142

genetics, and IQ, 249–50

genitals

and truth about sex, 126–27

See also penis; vagina

Gentzkow, Matt, 74–76, 93–97, 141–44

geography

zooming in by, 172–90

See also cities; counties

Germany, pregnancy in, 190

Ghana, pregnancy in, 188

Ghitza, Yair, 169–70

Ginsberg, Jeremy, 57

girlfriends, killing, 266, 269

girls, parents prejudice against young, 134–36

Gladwell, Malcolm, 29–30

Gnau, Scott, 264

gold, price of, 252

The Goldfinch (Tartt), 283

Goldman Sachs, 55–56, 59

Google

advertisements about, 217–19

and amount of data, 21

and digitalizing books, 77

Mountain View campus of, 59–60, 207

See also specific topic

Google AdWords, 3n, 115, 125

Google Correlate, 57–58

Google Flu, 57, 57n, 71

Google Ngrams, 76–77, 78, 79

Google searches

advantages of using, 60–62

auto-complete in, 110–11

differentiation from other search engines of, 60–62

as digital truth serum, 109, 110–11

as dominant source of Big Data, 60

and the forbidden, 51

founding of, 60–62

and hidden thoughts, 110–12

and honesty/plausibility of data, 9, 53–54

importance/value of, 14, 21

polls compared with, 9

popularity of, 62

power of, 4–5, 53–54

and speed of data, 57–58

and words as data, 76, 88

See also Big Data; specific search

Google STD, 71

Google Trends, 3–4, 3n, 6, 246

Gottlieb, Joshua, 202, 230

government

danger of empowered, 266–70

and predicting actions of individuals, 266–70

and privacy issues, 267–70

spending by, 93, 94

and trust of data, 149–50

and words as data, 93, 94

“Great Body, Great Sex, Great Blowjob” (video), 152, 153

Great Recession, and child abuse, 145–47

The Green Monkey (Horse No. 153), 68

gross domestic product (GDP), and pictures as data, 100–101

Gross National Happiness, 87, 88

Guttmacher Institute, 148, 149

Hannibal (movie), 192, 195

happiness

and pictures as data, 99

See also sentiment analysis

Harrah’s Casino, 264

Harris, Tristan, 219–20

Harry Potter and the Deathly Hallows (Rowling), 88–89, 91

Hartmann, Wesley R., 225

Harvard Crimson, editorial about Zuckerberg in, 155

Harvard University, income of graduates of, 237–39

hate

and danger of empowered governments, 266–67, 268–69

truth about, 128–40, 162–63

See also prejudice; race/racism

health

and alcohol, 207–8

and comparison of search engines, 71

and digital revolution, 275–76, 279

and DNA, 248–49

and doppelgangers, 203–5

methodology for studies of, 275–76

and speed of data transmission, 57

zooming in on, 203–5, 275

See also life expectancy

health insurance, 177

Henderson, J. Vernon, 99–101

The Herd with Colin Cowherd, McCaffrey interview on, 196n

Herzenstein, Michal, 257–61

Heywood, James, 205

high school students

testing of, 231–37, 253–54

and truth about sex, 114, 116

high school yearbooks, 98–99

hiring practices, 261–62

Hispanics, and Harvard Crimson editorial about Zuckerberg, 155

Hitler, Adolf, 227

hockey match, Olympic (2010), 193, 194

Horse No. 85. See American Pharoah

Horse No. 153 (The Green Monkey), 68

horses

and Bartleby syndrome, 66

and examples of Big Data searches, 22

internal organs of, 69–71

pedigrees of, 66–67, 69, 71

predicting success of, 62–74, 256

searches about, 62–74

hours, zooming in on, 190–97

housing, price of, 58

Human Genome Project, 248–49

Human Rights Campaign, 161

humankind, data as means for understanding, 16

humor/jokes, searches for, 18–19

Hurricane Frances, 71–72

Hurricane Katrina, 132

husbands

wives descriptions of, 160–61, 160–61n

and wives suspicions about gayness, 116–17

Hussein, Saddam, 93, 94

ignoring what people tell you, 153–57

immigrants, and origins of notable Americans, 184, 186

implicit association test, 132–34

incentives, 108, 109

incest, 50–52, 54, 121

income distribution, 174–78, 185

India

education in rural, 209, 210

pregnancy in, 187, 188–89

and sex/porn searches, 19

Indiana University, and dimensionality study, 247–48

individuals, predicting the actions of, 266–70

influenza, data about, 57, 71

information. See Big Data; data; small data; specific source or search

Instagram, 99, 151–52, 261

Internal Revenue Service (IRS), 172, 178–80. See also taxes

internet

as addiction, 219–20

browsing behavior on, 141–44

as dominated by smut, 151

segregation on, 141–44

truth about the, 140–45

See also A/B testing; social media; specific site

intuition

and A/B testing, 214

and counterintuitive results, 37–38

data science as, 26–33

and the dramatic, 33

as wrong, 31, 32–33

IQ/intelligence

and DNA, 249–50

of Facebook users, 261

and parents prejudice against children, 135

Iran, gays in, 119

Iraq War, 94

Irresistible (Alter), 219–20

Islamophobia

and danger of empowered governments, 266–67, 268–69

See also Muslims

Ivy League schools

income of graduates from, 237–39

See also specific school

Jacob, Brian, 254

James, Bill, 198–99

James, LeBron, 34, 37, 41, 67

Jawbone, 277

Jews, 129, 138

Ji Hyun Baek, 266

Jobs, Steve, 185

Johnson, Earvin III, 67

Johnson, Lyndon B., 170, 171

Johnson, “Magic,” 67

jokes

and dating, 80–81

and lying, 109

nigger, 6, 15, 132, 133, 134

and truth about hate and prejudice, 132, 133, 134

Jones, Benjamin F., 227, 228, 276

Jordan, Jeffrey, 67

Jordan, Marcus, 67

Jordan, Michael, 40–41, 67

Jurafsky, Dan, 80

Kadyrov, Akhmad, 227

Kahneman, Daniel, 283

Kane, Thomas, 255

Katz, Lawrence, 243

Kaufmann, Sarah, 236–37

Kawachi, Ichiro, 266

Kayak (website), 265

Kennedy, John F., 170, 171, 227

Kerry, John, 8, 244

King John (Shakespeare), 89–90

King, Martin Luther Jr., 132

King, William Lyon Mackenzie (alias), 138–39

Kinsey, Alfred, 113

Kirkpatrick, David, 154

Klapper, Daniel, 225

Knight, Phil, 157

Kodak, and pictures as data, 99

Kohane, Isaac, 203–5

Krueger, Alan B., 56, 238

Ku Klux Klan, 12, 137

Kubrick, Stanley, 190–91

Kundera, Milan, 233

language

and digital revolution, 274, 279

emphasis in, 94

as key to understanding bias, 74–76

and paying back loans, 259–60

and traditional research methods, 274

and U.S. as united or divided, 78–79

See also words

learning. See education

Lemaire, Alain, 257–61

Levitt, Steven, 36, 222, 254, 280, 281. See also Freakonomics

liberals

and origins of political preferences, 169–71

and parents prejudice against children, 136

and truth about the internet, 140, 141–45

and words as data, 75–76, 93, 95–96

library cards, and lying, 106

life, as imitating art, 190–97

life expectancy, 176–78

Linden, Greg, 203

listening, and dating, 82n

loans, paying back, 257–61

Los Angeles Times, and Obama speech about terrorism, 130

lotteries, 229, 229n

Luca, Michael, 265

Lycos (search engine), 60

lying

and age, 108n

and incentives, 108

and jokes, 109

to ourselves, 107–8, 109

and polls, 107

and pornography, 110

prevalence of, 21, 105–12, 239

and racism, 109

reasons for, 106, 107, 108, 108n

and reimaging data, 103

and search information, 5–6, 12

and sex, 112–28

by Stephens-Davidowitz, 282n

and surveys, 105–7, 108, 108n

and taxes, 180

and voting behavior, 106, 107, 109–10

“white,” 107

See also digital truth serum; truth; specific topic

Ma-Kellams, Christine, 266

Macon County, Alabama, successful/notable Americans from, 183, 186–87

Malik, Tashfeen, 129–30

Manchester University, and dimensionality study, 247–48

Massachusetts Institute of Technology, Pantheon project of, 184–85

Matthews, Dylan, 202–3

McCaffrey, Ed, 196–97n

McFarland, Daniel, 80

McPherson, James, 79

measurability, overemphasis on, 252–56

“Measuring Economic Growth from Outer Space” (Henderson, Storygard, and Weil), 99–101

media

bias of, 22, 74–77, 93–97, 102–3

and examples of Big Data searches, 22

owners of, 96

and truth about hate and prejudice, 130, 131

and truth about the internet, 143

and words as data, 74–77, 93–97

See also specific organization

Medicare, and doctors reimbursements, 230, 240

medicine. See doctors; health

Messing, Solomon, 144

MetaCrawler (search engine), 60

Mexicans, and truth about hate and prejudice, 129

Michel, Jean-Baptiste, 76–77, 78–79

Microsoft

and Cambridge University study about IQ of Facebook users, 261

Columbia University pancreatic cancer study and, 28–29, 30

and typing errors by searchers, 48–50

Milkman, Katherine L., 91–92

Minority Report (movie), 266

Minsky, Marvin, 273

minutes, zooming in on, 190–97

Moneyball, Oakland A’s profile in, 254, 255

Moore, Julianne, 185

Moskovitz, Dustin, 238–39

movies

and advertising, 224–25

and crime, 193, 194–95, 273

violent, 190–97, 273

zooming in on, 190–97

See also specific movie

msnbc.com, 143

murder

and danger of empowered government, 266–67, 268–69

See also violence

Murdoch, Rupert, 96

Murray, Patty, 256

Muslims

and danger of empowered governments, 266–67, 268–69

and truth about hate and prejudice, 129–31, 162–63

Nantz, Jim, 223

National Center for Health Statistics, 181

National Enquirer magazine, 150–51, 152

national identity, 78–79

natural experiments, 226–28, 229–30, 234–37, 239–40

NBA. See basketball

neighbors, and monetary windfalls, 229

Netflix, 156–57, 203, 212

Netzer, Oded, 257–61

New England Patriots-Baltimore Ravens games, 221, 222–24

New Jack City (movie), 191

New York City, Rolling Stones song about, 278

New York magazine, and A/B testing, 212

New York Mets, 165–66, 167, 169, 171

New York Post, and words as data, 96

New York Times

Clinton (Bill) search in, 61

and IQDNA study results, 249

and Obama speech about terrorism, 130

Stephens-Davidowitz’s first column about sex in, 282

Stormfront users and, 137, 140, 145

and truth about internet, 145

types of stories in, 92

vaginal odors story in, 161

and words as data, 95–96

New York Times Company, and words as data, 95–96

New Yorker magazine

Duflo study in, 209

and Stephens-Davidowitz’s doppelganger search, 202

News Corporation, 96

newslibrary.com, 95

Nielsen surveys, 5

Nietzsche, Friedrich, 268

Nigeria, pregnancy in, 188, 189, 190

“nigger”

and hate and prejudice, 6, 7, 131–34, 244

jokes, 6, 15, 132, 133, 134

motivation for searches about, 6

and Obama’s election, 7, 244

and power of Big Data, 15

prevalence of searches about, 6

and Trump’s election, 14

night light, and pictures as data, 100–101

Nike, 157

Nixon, Richard M., 170, 171

numbers, obsessive infatuation with, 252–56

Obama, Barack

and A/B testing, 211–14

campaign home page for, 212–14

elections of 2008 and, 2, 6–7, 133, 134, 211–12

elections of 2012 and, 8–9, 10, 133, 134, 211–12

and racism in America, 2, 6–7, 8–9, 12, 134, 240, 243–44

State of the Union (2014) speech of, 159–60

and truth about hate and prejudice, 130–31, 133, 134, 162–63

Ocala horse auction, 65–66, 67, 69

Oedipal complex, Freud theory of, 50–51

OkCupid (dating site), 139

Olken, Benjamin A., 227, 228

127 Hours (movie), 90, 91

Optimal Decisions Group, 262

Or, Flora, 266

Ortiz, David “Big Papi,” 197–200, 200n, 203

“out-of-sample” tests, 250–51

Page, Larry, 60, 61, 62, 103

pancreatic cancer, Columbia University-Microsoft study of, 28–29

Pandora, 203

Pantheon project (Massachusetts Institute of Technology), 184–85

parents/parenting

and child abuse, 145–47, 149–50, 161

and examples of Big Data searches, 22

and prejudice against children, 134–36, 135n

Parks, Rosa, 93, 94

Parr, Ben, 153–54

Pathak, Parag, 235–36

PatientsLikeMe.com, 205

patterns, and data science as intuitive, 27, 33

Paul, Chris, 37

paying back loans, 257–61

PECOTA model, 199–200, 200n

pedigrees

of basketball players, 67

of horses, 66–67, 69, 71

pedometer, Chance emphasis on, 252–53

penis

and Freud’s theories, 46

and phallic symbols in dreams, 46–47

size of, 17, 19, 123–24, 124n, 127

“penistrian,” 45, 46, 48, 50

Pennsylvania State University, income of graduates of, 237–39

Peysakhovich, Alex, 254

phallic symbols, in dreams, 46–48

Philadelphia Daily News, and words as data, 95

Philippines, cigarette economy in, 102

physical appearance

and dating, 82, 120n

and parents prejudice against children, 135–36

and truth about sex, 120, 120n, 125–26, 127

physics, as science, 272–73

pictures, as data, 97–102, 103

Pierson, Emma, 160n

Piketty, Thomas, 283

Pinky Pizwaanski (horse), 70

pizza, information about, 77

PlentyOfFish (dating site), 139

Plomin, Robert, 249–50

political science, and digital revolution, 244, 274

politics

and A/B testing, 211–14

complexity of, 273

and ignoring what people tell you, 157

and origin of political preferences, 169–71

and truth about the internet, 140–44

and words as data, 95–97

See also conservatives; Democrats; liberals; Republicans

polls

Google searches compared with, 9

and lying, 107

reliability of, 12

See also specific poll or topic

Pop-Tarts, 72

Popp, Noah, 202

Popper, Karl, 45, 272, 273

PornHub (website), 14, 50–52, 54, 116, 120–22, 274

pornography

as addiction, 219

and bias of social media, 151

and breastfeeding, 19

cartoon, 52

child, 121

and digital revolution, 279

and gays, 114–15, 114n, 116, 117, 119

honesty of data about, 53–54

and incest, 50–52

in India, 19

and lying, 110

popular videos on, 152

popularity of, 53, 151

and power of Big Data, 53

search engines for, 61n

and truth about sex, 114–15, 117

unemployed and, 58, 59

Posada, Jorge, 200

poverty

and life expectancy, 176–78

and words as data, 93, 94

See also income distribution

predictions

and data science as intuitive, 27

and getting the numbers right, 74

and what counts as data, 74

and what vs. why it works, 71

See also specific topic

pregnancy, 20, 187–90

prejudice

implicit, 132–34

of parents against children, 134–36, 135n

subconscious, 134, 163

truth about, 128–40, 162–63

See also bias; hate; race/racism; Stormfront

Premise, 101–2, 103

price discrimination, 262–65

prison conditions, and crime, 235

privacy issues, and danger of empowered government, 267–70

property rights, and words as data, 93, 94

proquest.com, 95

Prosper (lending site), 257

Psy, “Gangnam Style” video of, 152

psychics, 266

psychology

and digital revolution, 274, 277–78, 279

as science, 273

as soft science, 273

and traditional research methods, 274

Quantcast, 137

questions

asking the right, 21–22

and dating, 82–83

race/racism

causes of, 18–19

elections of 2008 and, 2, 6–7, 12, 133

elections of 2012 and, 2–3, 8, 133

elections of 2016 and, 8, 11, 12, 14, 133

explicit, 133, 134

and Harvard Crimson editorial about Zuckerberg, 155

and lying, 109

map of, 7–9

and Obama, 2, 6–7, 8–9, 12, 133, 240, 243–44

and predicting success in basketball, 35, 36–37

and Republicans, 3, 7, 8

Stephens-Davidowitz’s study of, 2–3, 6–7, 12, 14, 243–44

and Trump, 8, 9, 11, 12, 14, 133

and truth about hate and prejudice, 129–34, 162–63

See also Muslims; “nigger”

randomized controlled experiments

and A/B testing, 209–21

and causality, 208–9

rape, 121–22, 190–91

Rawlings, Craig, 80

“rawtube” (porn site), 59

Reagan, Andy, 88, 90, 91

Reagan, Ronald, 227

regression discontinuity, 234–36

Reisinger, Joseph, 101–2, 103

relationships, lasting, 31–33

religion, and life expectancy, 177

Renaissance (hedge fund), 246

Republicans

core principles of, 94

and origins of political preferences, 170–71

and racism, 3, 7, 8

and words as data, 93–97

See also specific person or election

research

and expansion of research methodology, 275–76

See also specific researcher or research

reviews, of businesses, 265

“Rocket Tube” (gay porn site), 115

Rolling Stones, 278

Romney, Mitt, 10, 212

Roseau County, Minnesota, successful/notable Americans from, 186, 187

Runaway Bride (movie), 192, 195

sabermetricians, 198–99

San Bernardino, California, shooting in, 129–30

Sands, Emily, 202

science

and Big Data, 273

and experiments, 272–73

real, 272–73

at scale, 276

soft, 273

search engines

differentiation of Google from other, 60–62

for pornography, 61n

reliability of, 60

word-count, 71

See also specific engine

searchers, typing errors by, 48–50

searches

negative words used in, 128–29

See also specific search

“secrets about people,” 155–56

Seder, Jeff, 63–66, 68–70, 71, 74, 155, 256

segregation, 141–44. See also bias; discrimination; race/racism

self-employed people, and taxes, 178–80

sentiment analysis, 87–92, 247–48

sex

as addiction, 219

and benefits of digital truth serum, 158–59, 161

and childhood experiences, 50–52

condoms and, 5, 122

and digital revolution, 274, 279

and dimensions of sexuality, 279

during marriage, 5–6

and fetishes, 120

and Freud, 45–52

Google searches about, 5–6, 51–52, 114, 115, 117, 118, 122–24, 126, 127–28

and handling the truth, 158–59, 161

and Harvard Crimson editorial about Zuckerberg, 155

how much, 122–23, 124–25, 127

in India, 19

new information about, 19

oral, 128

and physical appearance, 120, 120n, 125–26, 127

and power of Big Data, 53

pregancy and having, 189

Rolling Stones song about, 278

and sex organs, 123–24

Stephens-Davidowitz’s first New York Times column about, 282

and traditional research methods, 274

truth about, 5–6, 112–28, 114n, 117

and typing errors, 48–50

and women’s genitals, 126–27

See also incest; penis; pornography; rape; vagina

Shadow (app), 47

Shakespeare, William, 89–90

Shapiro, Jesse, 74–76, 93–97, 141–44, 235, 273

“Shattered” (Rolling Stones song), 278

shopping habits, predictions about, 71–74

The Signal and the Noise (Silver), 254

Silver, Nate, 10, 12–13, 133, 199, 200, 254, 255

Simmons, Bill, 197–98

Singapore, pregnancy in, 190

Siroker, Dan, 211–12

sleep

and digital revolution, 279

Jawbone and, 276–77

and pregnancy, 189

“Slutload,” 58

small data, 255–56

smiles, and pictures as data, 99

Smith, Michael D., 224

Snow, John, 275

Sochi, Russia, gays in, 119

social media

bias of data from, 150–53

doppelganger hunting on, 201–3

and wives descriptions of husbands, 160–61, 160–61n

See also specific site or topic

social science, 272–74, 276, 279

social security, and words as data, 93

socioeconomic background

and predicting success in basketball, 34–41

See also pedigrees

sociology, 273, 274

Soltas, Evan, 130, 162, 266–67

South Africa, pregnancy in, 189

Southern Poverty Law Center, 137

Spain, pregnancy in, 190

Spartanburg Herald-Journal (South Carolina), and words as data, 96

specialization, extreme, 186

speed, for transmitting data, 56–59

“Spider Solitaire,” 58

Stephens-Davidowitz, Noah, 165–66, 165–66n, 169, 206, 263

Stephens-Davidowitz, Seth

ambitions of, 33

lying by, 282n

mate choice for, 25–26, 271

motivations of, 2

obsessiveness of, 282, 282n

professional background of, 14

and writing conclusions, 271–72, 279, 280–84

Stern, Howard, 157

stock market

data for, 55–56

and examples of Big Data searches, 22

Summers-Stephens-Davidowitz attempt to predict the, 245–48, 251–52

Stone, Oliver, 185

Stoneham, James, 266, 269

Storegard, Adam, 99–101

stories

categories/types of, 91–92

viral, 22, 92

and zooming in, 205–6

See also specific story

Stormfront (website), 7, 14, 18, 137–40

stretch marks, and pregnancy, 188–89

Stuyvesant High School (New York City), 231–37, 238, 240

suburban areas, and origins of notable Americans, 183–84

successful/notable Americans

factors that drive, 185–86

zooming in on, 180–86

suffering, and benefits of digital truth serum, 161

suicide, and danger of empowered government, 266, 267–68

Summers, Lawrence

and Obama-racism study, 243–44

and predicting the stock market, 245, 246, 251–52

Stephens-Davidowitz’s meeting with, 243–45

Sunstein, Cass, 140

Super Bowl games, advertising during, 221–25, 239

Super Crunchers (Gnau), 264

Supreme Court, and abortion, 147

Surowiecki, James, 203

surveys

in-person, 108

internet, 108

and lying, 105–7, 108, 108n

and pictures as data, 97

skepticism about, 171

telephone, 108

and truth about sex, 113, 116

and zooming in on hours and minutes, 193

See also specific survey or topic

Syrian refugees, 131

Taleb, Nassim, 17

Tartt, Donna, 283

TaskRabbit, 212

taxes

cheating on, 22, 178–80, 206

and examples of Big Data searches, 22

and lying, 180

and self-employed people, 178–80

and words as data, 93–95

zooming in on, 172–73, 178–80, 206

teachers, using tests to judge, 253–54

teenagers

adopted, 108n

as gay, 114, 116

lying by, 108n

and origins of political preferences, 169

and truth about sex, 114, 116

See also children

television

and A/B testing, 222

advertising on, 221–26

Terabyte, 264

terrorism, 18, 129–31

tests/testing

of high school students, 231–37, 253–54

and judging teacher, 253–54

and obsessive infatuations with numbers, 253–54

online behavior as supplement to, 278

and small data, 255–56

See also specific test or study

Thiel, Peter, 155

Think Progress (website), 130

Thinking, Fast and Slow (Kahneman), 283

Thome, Jim, 200

Tourangeau, Roger, 107, 108

towns, zooming in on, 172–90

Toy Story (movie), 192

Trump, Donald

elections of 2012 and, 7

and ignoring what people tell you, 157

and immigration, 184

issues propagated by, 7

and origins of notable Americans, 184

polls about, 1

predictions about, 11–14

and racism, 8, 9, 11, 12, 14, 133, 139

See also elections, 2016

truth

benefits of knowing, 158–63

handling the, 158–63

See also digital truth serum; lying; specific topic

Tuskegee University, 183

Twentieth Century Fox, 221–22

Twitter, 151–52, 160–61n, 201–3

typing errors by searchers, 48–50

The Unbearable Lightness of Being (Kundera), 233

Uncharted (Aiden and Michel), 78–79

unemployment

and child abuse, 145–47

data about, 56–57, 58–59

unintended consequences, 197

United States

and Civil War, 79

as united or divided, 78–79

University of California, Berkeley, racism in 2008 election study at, 2

University of Maryland, survey of graduates of, 106–7

urban areas

and life expectancy, 177

and origins of notable Americans, 183–84, 186

vagina, smells of, 19, 126–27, 161

Varian, Hal, 57–58, 224

Vikingmaiden88, 136–37, 140–41, 145

violence

and real science, 273

zooming in on, 190–97

See also murder

voter registration, 106

voter turnout, 9–10, 109–10

voting behavior, and lying, 106, 107, 109–10

Vox, 202

Walmart, 71–72

Washington Post, and words as data, 75, 94

Washington Times, and words as data, 75, 94–95

wealth

and life expectancy, 176–77

See also income distribution

weather, and predictions about wine, 73–74

Weil, David N., 99–101

Weiner, Anthony, 234n

white nationalism, 137–40, 145. See also Stormfront

Whitepride26, 139

Wikipedia, 14, 180–86

wine, predictions about, 72–74

wives

and descriptions of husbands, 160–61, 160–61n

and suspicions about gayness of husbands, 116–17

women

breasts of, 125, 126

butt of, 125–26

genitals of, 126–27

violence against, 121–22

See also girls; wives; specific topic

words

and bias, 74–76, 93–97

and categories/types of stories, 91–92

as data, 74–97

and dating, 80–86

and digital revolution, 278

and digitalization of books, 77, 79

and gay marriage, 74–76

and sentiment analysis, 87–92

and U.S. as united or divided, 78–79

workers’ rights, 93, 94

World Bank, 102

World of Warcraft (game), 220

Wrenn, Doug, 39–40, 41

Yahoo News, 140, 143

yearbooks, high school, 98–99

Yelp, 265

Yilmaz, Ahmed (alias), 231–33, 234, 234n

YouTube, 152

Zayat, Ahmed, 63–64, 65

Zero to One (Thiel), 155

zooming in

on baseball, 165–69, 165–66n, 171, 197–200, 200n, 203, 206, 239

benefits of, 205–6

on counties, cities, and towns, 172–90, 239–40

and data size, 171, 172–73

on doppelgangers, 197–205

on equality of opportunity, 173–75

on gambling, 263–65

on health, 203–5, 275

on income distribution, 174–76, 185

and influence of childhood experiences, 165–71, 165–66n, 206

on life expectancy, 176–78

on minutes and hours, 190–97

and natural experiments, 239–40

and origin of political preferences, 169–71

on pregnancy, 187–90

stories from, 205–6

on successful/notable Americans, 180–86

on taxes, 172–73, 178–80, 206

Zuckerberg, Mark, 154–56, 157, 158, 238–39