We’ve heard a lot about the power of Big Data lately on EconTalk. But this week, host Russ Roberts invited a skeptic– who’s also a data scientist- to discuss some of its possible dangers.
Cathy O’Neil (who was also a guest in 2013, when she and Russ discussed the Occupy Wall Street movement) is the author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. She is particularly concerned with commercial uses of big data, as she finds the poor to be especially vulnerable to manipulation by it.
Their conversation begins with a discussion of “recidivism risk scores,” which judges often use in determining sentencing for convicted felons. Roberts and O’Neil are similarly disturbed by this use of data…and they reopen a common EconTalk theme of late, the distinction between accuracy and causality. (In another recent episode, Susan Athey referred to this distinction as that between prediction and causation.) The conversation moves to the issue of merit pay for teachers and commercial applications of big data. As you can imagine, Roberts’s and O’Neil’s concerns begin to diverge. The big question I’m left with is whether to be optimistic or pessimistic about the applications of Big Data moving forward…Both Roberts and O’Neil agree we’re still in its early days…What do you think? Does Big Data do more harm than good?
READER COMMENTS
Rick Hull
Oct 6 2016 at 6:15pm
I haven’t listened to the podcast but I’ve worked as a software engineer with Big Data, particularly in terms of A/B testing (do more people click on a red button or a blue button). For a website operator, it matters a great deal how users behave in statistical aggregate. 3% more “conversions” can make or break a year.
But for an individual website user, Big Data doesn’t have much value. In fact, purely data-driven decisions can make the website experience much worse for a typical user. For example, giant ads with tiny “close” buttons that result in users mistakenly clicking on ads. It takes some rigor, design sense, and theory to counteract the data-driven uptake in conversions.
Roger McKinney
Oct 6 2016 at 7:42pm
I worked for a small statistical software company when data mining was first becoming popular. It turned out to be a fad. Big data is a fad, too. It plays on the public’s ignorance of statistics. The public thinks more data gives more accurate results. It doesn’t. As Deirdre McCloskey has pointed out, it merely makes trivial things appear significant. Statisticians get their best results with random samples of data containing about 100 records. The best thing to come along in decades is machine learning and neural networks programs.
Kitty
Oct 6 2016 at 9:56pm
[Comment removed. Please consult our comment policies and check your email for explanation.–Econlib Ed.]
James
Oct 7 2016 at 5:27pm
I haven’t listened yet but now I’m wondering what Roberts and O’Neill would prefer be used instead of data when making parole decisions.
Stephen Gradijan
Oct 7 2016 at 9:50pm
I haven’t listened to the podcast yet either, but I recently watched this video on crime fighting by the NJ attorney general: https://www.ted.com/talks/anne_milgram_why_smart_statistics_are_the_key_to_fighting_crime?language=en
James
Oct 8 2016 at 1:54pm
Roger McKinney,
Statisticians with more than one hundred observations in front of them can easily take a random sample of about a hundred observations from their data to take advantage of whatever benefits come with data sets of that size.
pgbh
Oct 9 2016 at 5:53pm
I’ve always been a fan of “big data”. That’s because I think if you want to make better decisions, it helps to have more information.
But if big data also increases inequality and threatens democracy, well, now I’m three for three on favoring it!
Maybe it’s not too late to become a statistician, after all …
BorrowedUsername
Oct 11 2016 at 1:41pm
I just finished the book and really enjoyed it. Didn’t agree on every point, but it’s still a very useful framework for thinking about how models can have pernicious effects without any bad intention on the designer.
Often one of my rants against these models (which I have used a lot) is that they have some value in understanding the world as it is, but not as it could be. It’s very important that experiments and decisions be mostly tied to real testable hypotheses about behavior and not about the model itself. Many times these models introduce selection bias that is hidden beneath the surface and might not even be clear to the people using the model.
Comments are closed.