by Sydney Williams:
Big data is here. It is everywhere. It surrounds us. It was here before it became a “buzz word.” And it is expanding. Determining the quantity of information and how fast it grows is almost impossible.
The authors of Big Data, Viktor Mayer-Schönberger and Kenneth Cukier, cite a study done by Martin Hilbert of the University of Southern California’s Anderson School, as being one of the more comprehensive in this regard. According to Professor Hilbert, in 2007 there were more than 300 exabytes of stored information. (An exabyte is a 1 followed by 18 zeros.) And, according to the professor, information is doubling every two years, suggesting that there are now 1.2 zettabytes of stored information.
To put that number in some sort of perspective, Messer’s Mayer-Schönberger and Cukier note that in the third century BC, Ptolemy II of Egypt purchased (or stole) a copy of every written work for his library in Alexandria. The amount of information now available is sufficient to provide all 7 billion humans with individual, and distinct, libraries, each one 340 times bigger than that of Ptolemy. While no one knows the exact size of the library in Alexandria, estimates are that it contained about 500,000 scrolls. It was burned around 50BC, allegedly (and apparently inadvertently) by Julius Caesar.
The pursuit of knowledge has been a central purpose of man since he first appeared. The telescope provides us a sense for the age of the universe (about 13.8 billion years) and the knowledge that it is still expanding. The microscope introduces us to microbiology and has allowed us to see living creatures like the hydrothermal worm, which is less than half a millimeter in size. Magnified 525 times, it is an ugly creature with sharp teeth. Big data allow us to aggregate this and other, more mundane, information, most interestingly data about the way we behave. It is what is subsequently done with that data information should concern us?
Given the proliferation of smart phones, I-Pads, Twitter accounts, YouTube and Facebook accounts, we readily know the source of the data. It is estimated that there are a billion smart phones in use, a number that is expected to reach two billion in two years. The average smart phone has 41 apps. On average, these billion users spend over two hours a day surfing and checking messages. Additionally, the authors of Big Data note that there were 400 million tweets in 2012, growing at 200% per year. Google receives more than 3 billion search queries every day “and saves them all.” Each day Google processes a volume of data a thousand times greater than all the printed data in the Library of Congress. Eight hundred million YouTube users upload a total of an hour of video every second. As of last October, Facebook claimed 1.01 billion monthly members. Users upload 10 million photographs every hour. It is impossible for people to hide from their pasts. Big brother is watching, but who is he and is anyone watching him?
Mr. Mayer-Schönberger and Mr. Cukier write that the era of big data will challenge the way we live and interact in the world. “Society will need to shed some of its obsession with causality in exchange for simple correlations: not knowing the why, but only the what.” That strikes me as slighting curiosity.
Applications for the use of all this information is limited only by man’s imagination. Government, science, medicine, finance and marketing seem obvious, but energy exploration and manufacturing are beneficiaries as well. In fact, these changes will affect virtually all existing activities. It is the predictive nature of this data that especially interest the authors. Access to these data banks will allow individuals to save money on air tickets; it will help local health officials predict flu outbreaks, and better prepare for storms like Katrina and Sandy. Insurance companies will be able to provide health and life insurance without physical exams. Cars will drive themselves and Wal-Mart will know which flavor pop-tarts to stock at the front of a store before a hurricane. (“Strawberry,” they recently answered.) Universities will have greater predictability in selecting students and will also know which courses they are likely to select, allowing them to more accurately prepare faculty assignments. David Brooks wrote of this subject recently in his New York Times column last Friday, “Forecasting Fox.” The column dealt with Philip Tetlock’s 2006 book, Expert Political Judgment, which argued that pundits and experts are terrible predictors. Far better were those who used algorithms for correlating data and applied probabilistic thinking. “Being able to look at a narrow question from many vantage points and quickly readjust the probabilities was tremendously useful.” A side benefit, as Mr. Brooks pointed out, was that this may help depolarize politics, as the best predictions were based on raw data and probability, not on preconceived political opinions.
Useful applications of this mound of data will be found in almost every aspect of lives, but the owners of this data will an enormous responsibility. And, there is the major, unresolved issue of privacy. Who will monitor the collectors and owners of the data? Should the data reside in government or in private hands? Google alone has predictive ability regarding millions of people. Can they be trusted to treat it respectively, or will it be for sale? What happens if this information gets into unfriendly hands? Will the emphasis on the what cause us to lose interest in the why? Is that a good thing? Is education not largely about determining why things happen: Why is the Middle East a hot point for wars? Why was Shakespeare able to write such magnificent plays and poems? Why are you curious and I’m not, and why are your eyes brown and mine hazel? Why are some people prone to bad luck and others not? Why are both too much and too little information bad? Can we afford not to keep asking why? As a society, we must spend time considering these questions and more. It is not so much the level of information; that is a given. It is the manipulation of that data, why and by whom that should concern us.
With government becoming increasingly pervasive, the temptation to protect our privacy is only natural. Steve Lohr, who writes for the New York Times on technology, innovation and finance, wrote on the subject of privacy in the business section of Sunday’s paper. In it, he cited a report from the World Economic Forum. The report suggested that “collected data be tagged with a software code that included an individual’s preference for how his or her data are used.” That sounds wise, but raises the question: do children, criminals, and the mentally unstable have similar rights? Mr. Lohr quotes Dr. Alex Pentland of MIT, who agrees to limitations on data collection, as long as they do not damage the “public good.” But his definition of the public good may differ from mine. Does the “nanny state” of Mayor Bloomberg suggest we are becoming increasingly subservient to an all-knowing “Big Brother?” Is not that the inevitable consequence of a government that becomes more and more pervasive? Perhaps in preparation, Mayor Bloomberg has formed an Office of Policy and Strategic Planning, a group that Alan Feuer describes in the New York Times as “a geek squad of civic-minded number crunchers.”
We live in an exciting age. I love the ability to text my grandchildren. However, the rise in connectedness increases the chances for hackers and makes all of us more vulnerable, as Chinese cyber attackers have shown. Definitions of good and bad, in a world in which relativism dominates, are in the eyes of the beholder. Most Americans would agree that if hacking into Iran’s computers would deprive that country of nuclear weapons the world would be a better place. Of course, the Iranians and Russians would disagree. If drones are used willfully to spy on political dissidents that would appear to be a violation of our privacy rights, but not if they are used to rout terrorists. Can the line between the two be drawn firmly, or is there a grey area? The possibility to use information for good and the potential to use it for harm often reside within the same data base, the only difference being the user. Researchers at the University of California recently tested the ability to dismantle the braking system on a car. Traveling at forty miles an hour on an unused airport runway, the chase car was able to disrupt the electrical braking system of the car in front. That would be useful for the police; but it could be catastrophic, if the man behind me simply didn’t like the way I was driving.
In 1984, George Orwell expressed his fear of an omniscient, omnipresent, omnipotent government – one that could, for example, discover what an individual is thinking, without that person being aware. While we might not be at that point now that is what the authors of Big Data certainly infer is our future, with their emphasis on predictability. We cannot and should not stop progress, but we should enter this new realm with our eyes and our minds wide open.
Nothing will (or should) impede