The New World of Massive Data Mining

 - Flickr user: Daremoshiranai http:/www.flickr.com/photos/daremoshiranai/

Flickr user: Daremoshiranai http://www.flickr.com/photos/daremoshiranai/

The New World of Massive Data Mining

Private and government groups are finding new ways to mine massive troves of digital data. Tom Gjelten and a panel of experts look at the implications for national security, education, science, medicine, as well as privacy concerns.

Every time you go on the Internet, make a phone call, send an email, pass a traffic camera or pay a bill, you create data, electronic information. In all, 2.5 quintillion bytes of data are created each day. This massive pile of information from all sources is called “Big Data.” It gets stored somewhere, and everyday the pile gets bigger. Government and industry are finding new ways to analyze it. Last week the administration announced an initiative to aid the development of Big Data computing. A panel of experts join guest host Tom Gjelten to discuss the opportunities -- for business, science, medicine, education, and security … but also the privacy concerns.

Guests

John Villasenor

senior fellow at the Brookings Institution and professor of electrical engineering at UCLA."

Michael Leiter

senior counselor,Palantir Technologies, former director, National Counterterrorism Center.

Dr. Suzanne Iacono

co-chair, Big Data Senior Steering Group and senior science adviser, Directorate for Computer and Information Science and Engineering at the National Science Foundation.

Daphne Koller

professor,Stanford Artificial Intelligence Laboratory

Program Highlights

The term big data refers to the massive amounts of digital information companies and governments collect about us and our surroundings every day, pictures, records, temperatures, conversations. Our guests discuss how government and private industry are using big data and the main concerns surrounding its collection and utility.

What Is "Big Data?"

Villasenor said that big data is "really big." The amount of data that's estimated to have been created or replicated would fill 11 billion iPod classics, each holding about 160 gigabytes. "Remember that the world population is only 7 billion so that's a truly incomprehensible amount of data," Villasenor said.

Practical Uses

Every organization, whether it's government or private sector, uses information in different ways, said Leiter. In the world of terrorism, data that was collected clandestinely could be cross-checked with information that was available publicly to try to identify people who were doing suspicious things. In the private sector, organizations like banks use data routinely to identify cyber fraud and organized crime activity. "There's almost no application, either in government or the private sector, that can't benefit from some of this big data," Leiter said.

Privacy An "Enormous" Concern

Privacy is an enormous concern, but big data isn't necessarily always directly correlated with privacy, Villasenor said. For instance, the total amount of data needed to represent all the websites an average person visits in one year is not that big - about one or two megabytes. But a lot of people would consider that information very private, Villasenor said. "That said, of course, the more data that's out there, then the more opportunity there is that it could potentially be used in ways that were detrimental to privacy," he said.

You can read the full transcript here.

Comments

Please familiarize yourself with our Code of Conduct and Terms of Use before posting your comments.

This interesting and important program demonstrates how government seeks to assess and internet companies seek to monetize their understanding of individual users’ online activities.

What strategies can individual users employ to obscure their activities or, at least, make their online activities anonymous so they cannot be attributed to the individual?

Which enterprises infer from the prior question that I am trying to hide bad acts or, at least, bad intent?

April 2, 2012 - 11:34 am

I'm concerned about the potential of Big Data to manipulate political opinion and election results.

April 2, 2012 - 11:36 am

Targeted advertising is nothing new. As I've been planning my parents 50th wedding anniversary, I came across several letters to my mother congratulating her on her engagement and offering her their services. These letters were from florists, jewelers and photographers.

The difference today is the size, scope and speed of the targeted ads. Businesses always have and always will find ways to build their business.

April 2, 2012 - 11:54 am

The largest current data mining operation in education which is being used to improve learning, is occurring at the #1 most visited online educational web site, The Khan Academy. The evidence of that is shown in a CBS 60 Minutes video at

http://www.cbsnews.com/video/watch/?id=7401696n&tag=contentBody;storyMed...

See the last 3 minutes of that video.

April 2, 2012 - 11:57 am

The guests were well qualified IT professionals who understood the high level potentials for Big Data but it would have been helpful if you had also had a guest expert in the medical applications of Big Medical Data. IBM/Watson is doing a pilot program with Welpoint to overlay medical AI and a massive data base. The implications of this for improving health care delivery quality while reducing cost needs indepth discussion. Big Medical Data would also impact the cost of doing medical research and cut the cost of Big Pharm R&D significantly. Discovery of drug complications and of valid off label use for drugs would be established quickly. If you want more information about the crisis in the American Medical-Industrial Complex and how Big Data will help reform, see my web site: creativedesignforheatlthcarereform DOT us and read my book, "Discovering the Cause and the Cure for America's Health Care Crisis."

April 2, 2012 - 12:37 pm

Sickweather mines big data for sickness tracking and forecasting in real-time: http://www.sickweather.com

April 2, 2012 - 1:21 pm

Interesting and unsettling. Your guests talked about the benefits of emergency preparation and evacuation, education and more, but not about the dangers of near total transparency on the part of the public with too few safeguards against an intrusive government. The Supreme Court just voted to allow a strip searches for even minor offenses. Seventy years ago, 6 million Jews and other so-called undesirables were rounded up and killed. Now it would be so much easier. And the intervening years have showed that millions more have been labeled undesirable by too many governments--large and small. It is not that unusual. I say go slow with this.

April 2, 2012 - 1:29 pm

This has to be one of the most Pollyana-ish shows ever on Diane Rehm. You included at least one big data enthusiast and one near-enthusiast. In counterpoint, we get only the people who advise what we should already know: that big-data collection should occur only in concert with protections presumably provided by the government. But with the government being one of the interested parties in big-data collection, I hardly see that it could be the source of such protection.

My concerns were only compounded by Tom Gjelten's reply to one caller, from North Carolina, after cutting her off, something to the effect that such items as Big-Data would always excite "conspiracy theorists." The caller's comments were the most articulate of the entire show, but it was almost as if she was being squelched! Outrageous that Gjelten, inadvertently or otherwise, implied that the caller and her concerns were those of mere "conspiracy theories."

April 3, 2012 - 8:53 pm

The Diane Rehm Show is produced by member-supported WAMU 88.5 in Washington DC.