Big Data is the subject of a NIST-NSF workshop next week, but dealing with massive data sets is an issue that extends well beyond the science and engineering world. Credit: Falko Kuester, Calit2, University of California, San Diego.

Peter told us about emerging concerns at the federal level about the issues surrounding so-called “Big Data” that were expressed by agency top brass in a webcast back in March. The webcast coincided with the announcement (pdf) by the Obama Administration through the auspices of OSTP of the “Big Data Research and Development Initiative” and $200 milliion in new R&D funding. The announcement said that the funding would funnel through NSF, NIH, DOD, DOE and the US Geological Survey.

Big Data is the moniker assigned to very large data sets on the order of tera- and peta- scale and higher. Data sets this large are being generated in virtually every field of science and engineering from hurricane data to gamma-ray diffraction data. Large data sets are also an issue for internet interests like Google and Amazon, and of course, Homeland Security.

Next week NIST will be hosting a workshop, “Big Data Workshop,” in collaboration with the NSF Center for Hybrid Multicore Productivity Research to explore

  • State-of-the-art core technologies needed to collect, store, preserve, manage, analyze and share Big Data that could benefit from standardization
  • Potential measurements to ensure the accuracy and robustness of methods that harness these technologies

According to the preliminary agenda (pdf), the workshop will address data and algorithms, health care analytics, science analytics, business analytics and big data platforms. There also will be a poster session, panel session and networking reception.

More information is available on the website. Please note that preregistration is required in order to be admitted to the NIST campus and there are some additional requirements for non-US citizens.

Big Data is a big deal well beyond the shores of science and technology, too. A recent article in the Washington Post, explains, “sociologists, software engineers, economists, policy analysts and others in nearly every field are jumping into the fray. And nowhere has big data been as transformative as it has been in finance.” In fact, according to the article, Big Data has become so valuable that the World Economic Forum now considers it a new type of economic asset, similar to oil.

Where is Big Data coming from outside of sci-tech and Homeland Security? The Post article reports that IBM says that services like Google, Facebook, Twitter and others are generating 2.5 quintillion bytes of data every day. That’s a lot of information.

The article points out that accessing and using data is not without controversy as companies and governments expand their use and reuse of data, especially that which could be considered personal information. Some civil liberty groups worry about privacy and prejudicial labeling of people, for example, as high credit risks, terrorists, etc.

Data scholars, too, have some concerns about “information asymmetry, where certain parties have an unfair advantage because they have better information than others—a phenomenon that some have argued shakes the foundation of a market economy.” That is, in the marketplaces that are attempting to be transparent and “perfect”—whether financial, scientific or technical—what are the societal implications of unequal access to data?