Data confidential, the confessions of a five-year-old operator
Three takeouts from a ‘career’ at the edge of contemporary journalism
Dear reader,
The truth is that I had little time for a newsletter. I wrote this article for Wired Italy which was quite time-consuming. That’s why I indulged myself with one of my youth’s passions: meta journalism. Here, I try to put together what I learnt doing data journalism, figuring out some of the challenges I face speaking to editors and readers. It is not always easy, but I still think this quirk kind of journalism is worth pursuing.
BRUSSELS -- This month marks my 5th anniversary in data journalism. I began in 2016 doing sentiment analyses of Donald Trump’s and Hillary Clinton’s speeches at their respective conventions. I went a long way since the U.S. electoral campaign in 2016. Back then, I had no idea of what data meant beyond simple charts, open data, or a few Excel commands. Now, I think I have a better understanding of what data is, why it matters, and why data journalism stands out from other specializations in the business. Here are some takes from my career.
Data is arbitrary by definition
Data do not exist in nature. Units of measurement do not exist in nature. Everything we collect as data is arbitrary because it chooses a certain point of view. This does not mean that data are unreliable, it means that when data journalists approach a data set, they need to know that a dataset is only one of the many possible representations of reality.
For example, take your living room: you can describe it as ‘there is a bookshelf on the left-hand side of the room, there is a TV set at corner…,’ you can take a picture of it, and you can collect data (type of furniture, dimension, and quality). Any such representation is good in its own context. Do you want to have an inventory? Data collection is good. Do you want to describe your living room to a friend? A dataset is not good.
Besides, how do you choose definitions and units of measurements? We all live in a world where a bookshelf is a bookshelf and length measurements are standardised (metres, inches, feet…).
But this does not happen all the time. There are parts of human knowledge based upon data where a consistent part of the work is in defining the unit of measurement and the problem itself (here is an example). That makes my job more difficult on one hand, or more interesting, according to how I get up in the morning.
Data journalism is journalism
Last weekend, I published a story. A friend shared it on a WhatsApp group, and someone told me something like ‘well, this is not really journalism, it is more akin to a university dissertation.’ Alright, so let us start from the fundamentals: what is journalism?
In the Italian and French traditions, journalism is the bastard son of literature: it was something writers did to pay their rent. In the Italian tradition, facts are less important than good writing. According to the book ‘The Invention of News: How the World Came to Know About Itself’ by Andrew Pettegree, though, the origins of journalism are to be found in trade in XVII Century’s Netherlands, where printed sheets of paper were used to tell the daily prices of goods in the port of Rotterdam.
If we trust this theory, data journalism is at the very inception of the news industry as we know it: merchants in Rotterdam were buying paper reports to get the equivalent of an Excel file to decide their trading strategy. How different is it from the listings in the paper edition of the Financial Times?
Journalism evolved since the XVII Century. England invented parliamentary journalism, North America started its journalism industry, the newswires, the five Ws … and, eventually, data fell into oblivion.
Yet, data surfaced again in the ‘50s U.S. when TV networks tried the first experiments in predictive statistics, trying to figure out what candidate was supposed to win the presidential elections. Back then computers were as big as rooms and located only in universities, just to give a bit of context. Afterwards, people started talking about computer-assisted reporting ignoring – as it happens too often in the business – the history that led us to the point we are.
Data analysis is creative
Data don’t speak for themselves. Data need to be interpreted. Taking for granted that each of us has a bias, data analysis (with quantitative or qualitative tools) requires some degree of imagination. I will explain this point with an example.
A few months ago, I wrote a newsletter where I suggested that there is something unmeasurable that is missing from the idea of western democracy. I was arguing that, because since 1989 the number of developed but non-democratic countries grew, the global public opinion lost faith in the very idea of liberal democracy which, as such, needs some degree of faith to work.
The point I was making is not that you must mix religion and faith, but that democracy with its rites (take the Queen’s speech, the American presidential inauguration, or the July 14th parade in Paris) is more a civil religion (Montesquieu anyone?), rather than a form of government. And, given its non-rational legitimization (if democracy was rational, why didn’t the corporate world adopt it, by the way?), democracy works only when people believe it does.
Acknowledging that this is a far-fetched and uncomfortable theory, as one of the greatest social science methodologists ever, Howard Becker, said, social systems are like machines. You throw something to it, the machine processes it and gives you something in return. When you do social science, you should imagine a machine that gives the output you get according to the data you have. And, in this case, interpretation is more or less free, so journalists are free to dare (as long as they are in good faith).
Visualization is only part of the story
When people think about data journalism, they mostly think about charts. This is quite annoying because there is so much more. The beauty of data journalism is to research in a structured and replicable way in a space that does not exist naturally. To visualize this space, you do charts. But charts are not the aim, charts are what forensics is to a prosecutor or photography for war reporting, no more than that.
The real step forward is to develop formats where the body of an article explains the chart, particularly when you use software like Tableau. In this case, charts are not just charts: they are small web apps that allow the reader to autonomously explore the data and figure out what is going on.
In this case, the forensic role of charts is even clearer: the news is in the body, charts help explaining it. And the explanation requires hard work on the journalist’s side, hard work which is often overlooked because everyone likes charts. And this is frustrating.
The Remains of the Day
Data journalism is great. It forced me to learn so much stuff (technically, academically) that it would take a book to put it together. On the other side, outside a few elected news organizations, I had problems articulating this idea of data journalism (I could write a book about it too). Yet, I do believe that my mindset is the right one. And, looking around, I think I am less and less alone in this field. Is it frustrating sometimes? You bet it is, but when you see how much people appreciate your job, there is no better feeling in the world.
Thanks for your attention.