Tuesday, April 09, 2013

The fear of 'big data'

In my early days of my undergraduate education we read a lot about the relation between the model in the computer and the part of reality that the model was supposed to represent. My most influential professor at that time, Kristo Ivanov, always warned us about the danger when data is seen as a 'resource' like a natural resource to be harvest. He always stressed that all data, and even more so information, is the result of a process that involves mechanisms, procedures, measuring, and choices that ultimately rests on values and are driven by intentions. He always claimed: "there is no "raw" data". It is fascinating to see the exact same discussion emerging today  in the wake of the enormous interest in 'big data'. The same topics that were heavily discussed in the late 70s are again examined. The content of a new book "Raw data is an oxymoron" edited by Lisa Gitelman (MIT Press, 2013) is an evidence of that (here is a good review).

The discussion in the 70s about data, information and knowledge and how (computerized) models form and shape our reality was maybe exaggerated and raised way too early. The consequences of the use of data at that time was rather minimal, the computational abilities and even more the lack of 'data' made the fear of data used wrongly quite 'academic'. However with the advent of 'big data' and with the computational powers of today, the consequences are now real and have to be addressed.

1 comment:

LB said...

Have you read 'The Lean Startup'? Eric Ries spends some time in this book on measurement, criticizing what he calls 'vanity metrics' and advocating for their replacement with 'actionable metrics'.

Though not 'big data' necessarily, to me it certainly ties into this idea of bias in numbers (e.g. choosing what to measure), in addition to the more obvious and common example of interpreting those data.