Coronavirus crisis: can data visualization save lives?
From numbers to graphs and charts: how the COVID-19 communication embraced data visualization to inform, educate and change people behaviour.
A selection of tools for creators, storytellers and newsrooms.
A compendium for publishers and media makers.
No ads, better UX, cross-device and with hundreds of websites available: Scroll wants to be the solution to get casual news readers enjoy an ad-free news experience, seamless and frictionless.
What does that mean for the European media in 2020?
Despite Netflix, Amazon, Apple+ and Disney+, European media makers can still play the OTT game successfully, if they move fast and boldly.
Why European news publishers must develop a real alternative to Google News and stop debating about the link taxes.
🎙️ For a long time, I used to underestimate podcasting. We live in times of content overabundance and limited attention span. I wondered how podcasts could compete to attract and...
A collection of key data and charts related to 2019-2020.
Visual, immersive, interactive, mobile-friendly, mixed-media: it´s been years since publishers of any genre, newsmedia and brands started creating new types of visually-driven...
The Guardian is the best and one of the few examples of quality news media that have committed not to build any paywalls.When breaking down the Guardian model to discover its...
From numbers to graphs and charts: how the COVID-19 communication embraced data visualization to inform, educate and change people behaviour.
This essay is 3800 words, a 29-minute read.
February was all about numbers: crude, tabular, sometimes put on a map or placed in a bar chart. The daily counting of new infections and deaths marked the rhythm of the day and, with hourly case updates and endless streams of information incited more panic than awareness.
Those numbers were often communicated without a context to provide them with meaning. Let’s take the case of numbers of infections: depending on how early testing started, and how massively, that data can be irrelevant or – worse – misleading.
That numbers alone are not enough to promote awareness and political action, as well as individual behavioural change, is not new.
We know it from climate change communication: it struggled to find listening ears until we experienced its impact of it in our lives, with summer heat, bush fires and floods, droughts and tornados devastating our western world.
March 2020, however, marked a turning point for data communication around the COVID-19 epidemics. Together with life scientists and independent information designers, the news media and agencies that a dozen years ago embraced data visualization made a productive effort to overcome the complexity of such type of data communication.
The result has been a palette of data-driven storytelling, explanatory illustrations, and charts “made for the laypeople”, that represents a success story of how to educate, inform and – above all – trigger collective behavioural change. A change that truly matters, since it contributes to saving lives.
So tweeted Carl Bergstrom, a professor of biology at the University of Washington, on 6th March 2020, in a thread introducing the “flattening the curve” chart. A few days later, the New York Times nominated this chart “the defining image of the crisis”.
Why is this chart so effective? It is simple but not simplistic. Labels and text are clear, concise and let you picture the alternative scenarios in the blink of an eye. The reference line (indicating the Healthcare system capacity) gives us an immediate understanding of the impact of both scenarios.
No numbers are included; they do not need to. This graph does not need complications. Clear and to the point, easy to read, mobile-friendly, this graph tells you what you need to do, everywhere: help flatten the curve so we can get the number of COVID-19 cases manageable by our healthcare systems and ensure everyone can get the medical assistance they need.
What can we learn from the many visualizations dealing with the COVID-19 epidemic? What basic principles we should apply? What formats do help, what not so much?
Here is a list with my ten takeaways from the dozens of charts and pieces of data journalism I have seen in the last four weeks.
“Why outbreaks like coronavirus spread exponentially, and how to ‘flatten the curve’,” is a visual explainer of how a simulated virus could spread through human contact with or without social distancing or quarantine, displayed with coloured balls bouncing back against each other showing sick, healthy and recovered individuals.
Published on 14th March by the Washington Post and made available to all without a paywall, the story was tweeted by the former USA president Barack Obama to his 115 Million followers and used by President Maduro in Venezuela to explain the containment measures on TV.
Since then, it became the most-read story in the history of the Washington Post´s website and translated into 13 languages.
The author, the graphic reporter Harry Stevens, at first wanted to use real-life data from the COVID-19 and simulate the actual virus, but that would have required to run computationally intensive mathematical models on supercomputers. Then, he went back to the original question and elaborated on the purpose of making such a visualization.
It was not so essential to make a simulation on real data but to make visible the impact of different measures on how such a virus spreads: the purpose determined the making of the visualization – the visual metaphors, the colours used, the way to show the different scenarios one after the other.
The result is storytelling that takes the reader through each step of the simulation. It starts introducing the ball metaphors used and then displays them bouncing within a rectangle, simulating how human contact affects the spread of the infection under different scenarios: (1) no containment measures (2) forced quarantine (Hubei-like) (3) social distancing (4) more social distancing.
What made this illustration so powerful was, according to many readers, its ability to reduce anxiety and help to understand the idea of “social distancing”, a definition very few people outside the healthcare community ever heard.
It took two weeks to develop such a piece of visual journalism. The first part of that time was spent thinking about the “why” of the visualization, before getting into the “how” and the “what”.
For information designers, it is tempting to go straight to the available data and build a visualization out of it, as fast as possible. Once verified, they might think, those data are okay to develop models, graphs, visual explainers. Unfortunately, the reality of an epidemic is different for many reasons:
a. Information on an outbreak is subject to change rapidly: your visualization needs to be one that can be updated as soon as new knowledge is available. It is not only about making bar charts that take data from an online database (that would be obvious). It is about making any visualization (e.g. transmission mechanics) subject to amendment, extension, correction.
b. Some data can be valid in one context (a country, a region with specific demographics) and not in another. If your visualization aims to reach many audiences, it might be relevant for someone and misleading for another. In case some data have limited relevance, better to cut them off and stick to the few data that are meaningful for all.
c. Many sources currently used by journalists are still “preprint” researches, not yet fully peer-reviewed; that is, they might be hypotheses and as such, be disconfirmed, corrected or even unpublished very soon. 50% of the reporting done so far on COVID-19 is already obsolete a/o invalidated, but it still floats on the sea of the Internet.
Avoid that to happen to your visualization.
So, the best you can do is:
A mathematician Adam Kucharski, author of The Rules of Contagion: Why Things Spread — and Why They Stop, has provided the best explanation of the ratios and numbers related to the pandemic outbreak on The New York Times. It is a very long read, worth every minute spent before starting your information design project.
Dr Siouxsie Wiles, an Associate Professor from the University of Auckland, provides an excellent explanation of the uncertain data behind the pandemic: you can listen to her in this podcast interview made by Datajournalism.com
To show the WHAT, WHERE, and SIZE of the outbreak is of paramount importance for a pandemic. That is why, after the first stage in January and February dominated by bar charts and rankings, dashboards now dominate the disease communication landscape.
The dashboard developed by John Hopkins University plays a central role, by collecting data from worldwide sources on an almost hourly basis, and other scientific organizations and news media rely on its data to build their local data visualizations. For instance, the dashboard developed by the German Robert-Koch-Institut for the German disease data.
The John Hopkins University dashboard, born from a student project, is now visited by more than 1 billion users worldwide every day! … But, is a dashboard the best solution for public communication of the COVID-19 disease?
Such dashboards can be overwhelming and – if you do not read the tiny footnotes helping to understand the data – confusing. Therefore, media should make an effort to get back to the original database (in the case of the JHU, it is available on Github here) and develop their custom versions, selecting and displaying data subsets to make the reading and interpretation easier. A good example can be found here, on the Berliner Morgenpost.
In general, news media should not rely only on dashboards, but develop a set of dynamic graphs and charts, with fewer data and more explanations, so to guide readers step by step.
One of the challenges of comparing the virus spread trajectory country by country is that the outbreak started sooner in some regions, later in others.
One solution is to keep the time constant, that is: setting a standard T0 for all the countries curves, defined as the day when more than 100 cases were recorded (see the 91-divoc.com), or more than three daily deaths were first recorded (see the Financial Times/ coronavirus-latest, free-to-read-no-paywall).
Visualizing the curves with a time constant, you can grasp how similar (in most cases) and how different is the outbreak depending on country, regions, cities (FT.com). What´s more meaningful, you get a sense of how far ahead you are and what is coming next.
We still see lots of media calculating the COVID-19 fatality rate by dividing the cumulated number of deaths by the number of recorded cases (CFR, Case Fatality Rate). Simple math, but misleading.
The challenge is that it’s tough to calculate an accurate, generalisable case fatality rate for this disease when the data we have is so uncertain, starting from the real number of infected.
Misleading can be too the number of casualties per 1 Million people. It sounds like a way to make countries more comparable. Still, it is not, as it is not meaningful at all to compare the COVID-19 fatality rate against other diseases, especially the ones occurred a long time ago, such as the Spanish Flu. In all those cases, both infographics and pieces of journalism can be intellectually fascinating, but they are not useful to understand what is happening and what to expect.
This visualisation does not help:
Looking for a valuable alternative, that truly helps to see how we are on track to contain the pandemic and if there are countries we can learn from?
Reuters’ “Breaking the wave” has taken a different approach to visualising the death toll day after day, by focusing on the death growth rate in each country: did the deaths double, triple, quadruple compared to the day before? When do casualties start to increase less or to stay stable, until they hopefully become zero, like in China?
In the Reuters’ graph, you see the death toll like a wave. The wave is made by daily lines measuring how much the number of fatalities grew in the last seven days: “breaking the wave” means bringing the growth rate down until the number of deaths is not longer multiplying each week.
As the explainer text points out, “it is important to keep in mind that deaths are a lagging indicator, which means we may only see a decrease in response to effective containment policies weeks after they are implemented”. However, the shape of the wave is the best indicator of how effective is the response to COVID-19 toward its ultimate goal: to save lives.
In “How charts lie”, information designer Alberto Cairo debunks some myths about the power of data visualizations. One of the most common misconceptions is that data visualization can speak by themselves, by being so intuitive that you can understand them at a glance. It is what we know as the “A picture is worth a thousand words” principle.
On the opposite, 50% of the communication value of your visualization comes from the words surrounding it. And how you are good at explaining and directing your viewers to the right information.
In the case of COVID-19 data visualizations, annotations play a critical role, mainly because not everybody knows the difference between linear and logarithmic scale, nor is familiar with medical terms. Never forget that the entire world could potentially access your visualization, once published on the Internet: you have to be mindful of how your charts might be misinterpreted.
In the Financial Times charts I have referenced above in this article, John Burn-Murdoch has added small annotations on each graph to summarize what the featured Countries have done to help flatten or change the trajectory of the infection curve.
He took a step further: after having heard lots of questions from readers about the set of charts, John has shot a video explainer where he goes more in-depth in the methodology and design choice and helps to provide clarification on how to read, use and interpret the charts.
There is a couple of more things to consider when making COVID-19 data visualizations. Such artefacts are often shared on social media and consumed on mobile devices. Therefore, all design choices must be careful and be at least mobile-friendly, if not mobile-first, so to be digestible and do not make room for misinterpretation.
We are almost overwhelmed by data on COVID-19. Some might think that we have more knowledge than ever before on a pandemic and that, with all those data available, it could be possible to develop more certainty on how the disease will progress.
Truth it, we are at the beginning of the pandemic, although our perception is now (given the worldwide lockdown) like it has been our companion for a long time already. And being at the beginning, what we know for sure is limited.
For scientists and statistician, to develop a good model is an almost insurmountable challenge as of now. All predictions are based on statistical models, that rely on data, which are not yet consistent.
Take your time to read this feature article by FiveThirtyEight, the data journalism title 100% focused on making sense of statistical analysis: Why It’s So Freaking Hard To Make A Good COVID-19 Model.
Modelling COVID-19: too much uncertainty. Source: 538 by ABC News Internet.
So, the best thing you can do is to go back to the basics.
Show what comes from trusted sources. Do not try to mix data up to make your models, if not with the assistance of an expert epidemiologist. Explain always, before and after your visualisation, what the main uncertainties are.
And never, really never be judgemental. Nobody has the ultimate answer to the question of why mortality in some countries is so higher than in others, nor why some countries had a different development of the infection curve. Until the thing calms down, you never know what it might happen.
That is why, any tale – including a data visualisation – must be cautionary, as it is this insightful, masterly piece of journalism by The New York Times: The Lesson of Lody and Bergamo – Timing, Timing, Timing, Timing, Timing.
Information design works when it enables conversations and helps to answer essential questions. In the news and commercial media, part of the so-called attention economy, it can be tempting to generate visualizations for the sake of driving attention and engagement. Let’s put this way: also for the COVID-19 crisis, some data visualizations have adopted the same clickbait tactics of some written-for-social-media articles.
Instead, it is crucial to select and focus the data that matter more to the life of your audiences and to make an effort to transform them into meaningful visual information.
VG.no, Norway’s most-read online newspaper, broke down the numbers of local infections to provide this picture: how many people were (1) hospitalized, (2) taken to the intensive care, (3) needed ventilation. Such degree of information, once accurately retrieved, is easy to visualize and very meaningful for the readers.
After all, the “Flatten the curve” message is all about how many people have severe disease and deserve to be taken care of accordingly, compatibly with the system capacity.
Another important information is the actual demographic distribution of infections, especially to dismantle some misconceptions that circulated between February and March 2020. This information is only relevant at the local level since the demographic and sociographic profile is different from country to country.
In many cases, such visualizations provided people with the consciousness that COVID-19 could affect all, although in different proportions.
I am happy that data visualizations went mainstream. But, at the same time, I see the danger of visualizations done for the scientific community going public, especially on social media.
“Open knowledge” is a seductive idea. Still, we have seen in the case of COVID-19 how social media channels amplified and distorted messages by sharing DataViz extrapolated from scientific conversations.
The worst example dates back to February and has to do with the website Nextstrain.org, an open-source bioinformatics project providing interactive, live data-visualizations mapping the evolving pathogen genome data. The goal is good and welcome: to offer to the broader community of virologists, public health officials and citizen scientists access to tools generally reserved to few labs.
Unfortunately, once these visualizations are shared on social, things became messy. So it happened in the first week of March 2020, when one of the Nextstrain project promoters, Trevor Bedford, shared a visualization suggesting that almost all infections in Europe and Italy could have come from Bavaria, where one of the first European cases of COVID-19 was recorded on the 28th January.
It was just a hypothesis, soon abandoned. Still, once taken among the broader audience, it became a case for accusing Germany to have introduced the virus to Italy and the rest of Europe.
One complex visualization, not fit for the public domain, became a case for a highly politicized, non-sense, totally unuseful conversation. It landed on Italian newspapers and Tv shows as well.
Something like that should not happen. It is up to scientists, information designers, and journalist to consider if a visualization serves an actual information need in the public domain. Charts and graphs can lie, and even more, they can be misused.
So, if one viz is too complicated or if it needs way too much explanation or if it runs the risk to be extrapolated and misinterpreted: well, better to keep it for restricted use only.
In his books and presentation, Alberto Cairo often reminded that one of the biggest challenges for information design is our moral blindness. We dislike being held accountable for the information we produce or spread over our networks.
Every visualization, when ambiguous or designed poorly, can be misused. So, in the case of a tragic disease like the COVID-19, you need extra care.
How you show the disease outbreak on maps, rankings and comparison charts, not to mention the colours you chose to display infections and death: every design choice matters. And can help, or hurt.
The coronavirus is not a death sentence. Most infected people will survive. Represent them with respect, using good sense when choosing visual metaphors and colours: avoid too many red-coloured points, avoid crosses to show deaths.
Ensure you are not stigmatizing people who are from countries and regions with lots of cases. How to design maps and charts can have a vast influence on how people perceive the danger.
Use text to remind that those data points are humans; numbers are people; curves and lines are tales of human suffering.
Unfortunately, the COVID-19 disease is also opening a pandora vase of social and economic inequality, of poor political choices, of medical systems impoverished by corruptions and hyper-privatization…
That is why, when producing a data visualization or sharing a dataset, think about whom you can educate, who might you affect and how.
COVID-19 data visualizations have proven they are not there only here to document what is happening, but for promoting attitude and action. Think always about how your DataViz can help. Not about how popular it can get.
Reuters made a considerable effort to bring together data visualizations and the tragedy behind those data. The result was a great piece of data journalism. Filled with empathy: something we need now more than ever.