Big Data in Our Lives: For Better and For Worse

An engaging and thought-provoking book like Big Data: A Revolution that Will Transform How We Live, Work, and Think couldn’t be any timelier. Given the crash course on data collection and analysis the American public has gotten from our national security agencies in recent months, we need to know more, and we need to know now.

Big Data A Revolution That Will Transform Book CoverVictor Mayer-Schonberger and Kenneth Cukier call themselves “messengers” about Big Data and not necessarily “evangelists,” but it’s hard not to get a little excited by the promise that Big Data analysis can hold for all aspects of our lives assuming, of course, it’s done well and with good or at least neutral intentions.

What fascinates the authors most about Big Data is that it ends centuries of statistical analysis by sampling. In the old days, there wasn’t enough computational power or data storage capacity to analyze every data point, so statisticians were forced to extrapolate from small samples. Not anymore. It has become incredibly easy to grab several sets of data, even unstructured data sets that don’t match, and blend them together for instant analysis. Do your Netflix rentals indicate whether your house has substandard fire protection? There’s probably a way to figure it out.

Of course, Big Data can be a blunt tool. Mayer-Schonberger and Cukier emphasize that Big Data asks and answers only “what,” not “why,” and this, too, is a major shakeup to decades of statistical analysis. Causality takes a back seat. Why will certain parolees be the most likely ones to commit crimes again? Big Data analysis can figure out who they are, but it doesn’t explain why they are the ones. Used cars painted orange are the most likely vehicles to be in good condition. That we know. Why is it true? Does it matter?

Precision may also take a back seat. “To reap the benefits of harnessing data at scale, we have to accept messiness as par for the course, not as something we should try to eliminate,” the authors say. The SQL world is quickly becoming the noSQL world.

The datafication of everything is sending us down the path toward predictive analytics. The books Amazon recommends to you are more spot on than ever, and New York City can predict with a good degree of accuracy which manhole covers are most likely to explode. The danger, of course, is that if these predictions are 90 percent accurate, ten percent of people may fail to get the loan they desire or, in the most dystopian vision, people may be falsely accused of crimes even before a crime is committed, a la Minority Report. (Remember when the TSA grabbed Ted Kennedy at the airport?) Big Data gurus often talk about the anonymization of data as a way of reducing privacy risks, but as we’ve seen recently, our privacy is constantly being chipped away, often with our own permission and assistance (those Facebook “likes” aren’t just for fun).

Benefits and risks: Big Data presents us with plenty of both as Moore’s Law continues to reshape every aspect of our daily lives. This book explains both well, and we need to understand them as the datafication of the word continues at breakneck speed

Big Data: A Revolution that Will Transform How We Live, Work, and Think, by Victor Mayer-Schonberger and Kenneth Cukier, hardcover, 242 pages. Published by Houghton Mifflin Harcourt.

Comments

  1. BY DF says:

    Wow Don, so many inaccuracies in your write-up. I don’t know if they are your opinions or the authors of the book that you reviewed, but I will address them here:

    1) statisticians sampled data because they didn’t have the computational power to use a full dataset….ummm, no. statisticians (like me, a quantitative methods background) sample because there is no need to go through millions or billions of records to achieve the same result…in fact, it is generally provable, for a normally distributed dataset, that no more than 10k data points are necessary to achieve the same statistical accuracy as a million data points

    2) statistics can tell “why” not “what” whereas “big data” can only tell “what.” this, is also incorrect. in fact, in every stats 101 course the mantra is “correlation does NOT imply causality.” in other words, stats also only tell “what” not “why.”

    I wish you, or the authors, had taken some basic statistics courses so these inaccuracies don’t become some informed truth out in the IT community.

  2. BY Matt says:

    @DF:

    1) How would you build a recommendation system of the Amazon type, with millions of books, with only 10k data points?

    2) You can establish causality in experimental settings where you can control the independent variables, such as a randomized controlled trial.

Post a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>