Big Data, data integrity, and the fracturing of the control zone

Despite all the attention to Big Data and the claims that it represents a ‘‘paradigm shift’’ in science, we lack understanding about what are the qualities of Big Data that may contribute to this revolutionary impact. In this paper, we look beyond the quantitative aspects of Big Data (i.e. lots of data) and examine it from a sociotechnical perspective. We argue that a key factor that distinguishes ‘‘Big Data’’ from ‘‘lots of data’’ lies in changes to the traditional, well-established ‘‘control zones’’ that facilitated clear provenance of scientific data, thereby ensuring data integrity and providing the foundation for credible science. The breakdown of these control zones is a consequence of the manner in which our network technology and culture enable and encourage open, anonymous sharing of information, participation regardless of expertise, and collaboration across geographic, disciplinary, and institutional barriers. We are left with the conundrum—how to reap the benefits of Big Data while re-creating a trust fabric and an accountable chain of responsibility that make credible science possible.