We have a picture in the studio that serves to reminds us of the chaotic and dangerous nature of our main building material. It is a still from The Dark Knight and shows Heath Ledger’s Joker in post-crime rapture. The legend is ‘Data Doesn’t Care’. This rather existential phrase is inspired by an entry in the blog of Ben Horowitz called ‘Nobody Cares’.
Data often has no inherent meaning or form. Location data won’t have the right locations, time-based sets will have different deltas, and so forth. Data should be assumed to be a chaotic primordial soup until investigated further. The assumption that ‘stories emerge’ naturally from data is false – they need a lot of coaxing and even before then the data as a material needs a lot of treatment to be workable. We have worked with many different data-sets, and with the exception of financial (and some sports) data, much of what we deal with is unstructured, incomplete and unsound.
Out of necessity, we employ robust ingesting and processing to give the data integrity. We often need to write algorithms or weight it somehow to give it meaning to users, pre-visualisation. This is why an emerging maxim of data journalism is ‘find someone as au fait with the domain as the technology’. The idea that anything just emerges from the data is not true – you need a hunch to take to the data, a hypothesis to investigate.
If ‘Data is the New Oil’, then there will be many projects that have blown up by not respecting the combustible nature of this stuff.