The choices we make when we visualise data influence how people see, understand, and react to the underlying data. This session includes a series of slides to build your knowledge, a set of five short videos to demonstrate creating a graph and a short quiz to test your knowledge. The expected length of time to complete all segments is 25 minutes. In this introductory session to visualising data using examples from the COVID-19 pandemic, you will learn: how to identify and visually represent different relationships in data; and how to prepare and structure your data in order to make a simple line graph.
How to Navigate the Session The progress bar at the bottom of your screen lets you know where you are in the session. If you would like to skip through the pages, you can do this using the navigation menu at the bottom left of your screen. Each page introduces a tip, concept or best practice for visualising data during the COVID-19 pandemic. Scroll through the pages using your keyboard or by clicking on the right and left arrows at the bottom right of your screen. On certain pages, there is the option to scroll down to learn more about a particular topic. Navigate to these vertical pages using the down arrow on your keyboard or by clicking on the down arrow at the bottom right of your screen. Press down to learn more about navigating the session. How to Navigate the Session The navigation arrows at the bottom right of your screen are there to lead you through the session. In the example below, the left faded arrow indicates completed pages whereas both the down arrow and right arrow indicate pages you have yet to visit. Now that you understand how to navigate this session, click on the right arrow to start the session.
We visualise data to explore it. Through exploratory data visualisation, we are able to uncover different relationships and to discover the most meaningful information to highlight or analyse further. We also visualise data to explain what we are seeing in it. Explanatory visualisation can be used to inform, persuade and engage an audience. This is the type of visualisation we are going to focus on here.
Before you start, get to know your data Before digging into a COVID-19 dataset, it is important to spend a bit of time understanding what can and cannot be said based on the data. Too often we jump into analysis before taking the time to understand what aspects of the world our data represents. The first step is to understand the definitions of the data included in your dataset. To do this, you’ll want to review the following: Press down to learn more about metadata and data dictionaries. Metadata is data about data. Reviewing the metadata before you download a new dataset can save you time and let you know whether the data you are about to download even has what you need. The Humanitarian Data Exchange requires all data shared on the platform to include a few essential metadata fields. Metadata provides basic information about a dataset such as: Source: who is releasing the dataset. Date: to include the date the dataset was initially released, the date it was last updated and released, and the expected frequency of update to anticipate the next release date. Location: the geographic extent of the dataset and the granularity of the dataset (e.g. for a specific number of countries with data at an admin1 level). Methodology: how the data was collected. Press down to learn more about data dictionaries. Some organisations will publish data dictionaries along with their data. Oftentimes the headers in a data set are condensed to conserve space and make them machine readable. For example, a field with a header named “cases” could mean cumulative cases, daily cases, or active cases. A data dictionary provides definitions for variables in a data set to understand what each variable represents. The example below provides an example of definition given for positive and suspected patients as well as the data type (numeric).
The accuracy of these numbers is dependent on a number of factors. The number of confirmed cases depends on the extent of testing being performed in a country, the accuracy of the tests, and the lag time between when a person is infected, symptomatic and tested. The number of deaths depends on if deaths outside of hospitals are counted and if tests are being performed within hospitals prior to or after death. The number of active cases is calculated using data on recovered cases. Not all locations are providing proper data on recovered cases and as a result this data may not be reliable. To ensure that the data you have is as up-to-date and accurate as possible, prioritise data from reputable sources and data with well documented collection methodologies. Press down to learn more about reliable data sources HDX houses a diverse range of datasets on COVID-19 for humanitarians. These include: Global case data from the Johns Hopkins School of Public Health and the World Health Organization; Testing data from Our World in Data; Global travel restrictions and airline information from the World Food Programme.
Choosing the right visualisation can feel overwhelming. There are a lot of options out there. The best way to narrow it down is to identify the relationship between the variables you want to highlight. In this section, we will explore the following relationship types and discuss good chart options for each:
If you have time series data, a common relationship to visualise is change over time. Make sure to plot the timeline on the X axis. The following charts are options to illustrate trends over time. Source: Andy Kriebel & The Financial Times Press down to learn more. A line graph will emphasise trends by showing a series of points connected by straight lines. Adding visual markers for important events, e.g. when stay at home orders went into effect, is an easy way to add context. Multiple time series should always use line graphs. Source: Our World in Data Press down for more examples. Time series bar charts are used to emphasise individual values at distinct points in time. They work best when data points are at equal intervals such as days, months or years, e.g. the number of confirmed cases reported daily. Always make sure your Y axis starts at zero. Source:Worldometer Press down for more examples. Stacked area graphs are useful for comparing multiple data series changing over time. The values of each series are displayed on top of each other, which allows you to visualise the change of the total value, e.g. the number of confirmed COVID-19 cases globally, and how the values are evolving in component groups, e.g. region.
Visualisations will often use size, length and position to help viewers explore differences and similarities between categories and data series. Source: Andy Kriebel & The Financial Times Press down to learn more. Ordered bar and column charts are excellent for showing position and comparing across categories. When your labels are long, horizontal bars should be used. It is always preferable to sort your data as this will make it easier to read. Press down for more examples. Often called bubble charts, these charts are typically used to compare and show the relationships between categorised circles using both positioning, colours and proportions.
Distribution visualisations are often used to explore data as they show the values in the dataset and how often they occur. The shape (or the skew) of a distribution can be a memorable way of highlighting the lack of uniformity or equality in the data. Source: Andy Kriebel & The Financial Times Press down to learn more. Histograms are useful for understanding the spread of the data. They show where values are concentrated, what the extremes are and whether there are any gaps. They may look similar to a vertical bar chart but are different in an important way. The horizontal (X) axis is numerical, not categorical. Source: Springer Press down for more examples. Box Plots (also referred to as box and whisker plots) are popular among statisticians and researchers. They are helpful for summarising multiple distributions by showing the median (centre) and range of the data. While useful for exploring data, not everyone understands how to interpret box plots. Use sparingly as an explanatory tool.
These charts are useful for showing how various parts of the data comprise the whole. While useful with a small number of categories they can quickly become difficult to read. If you are mainly interested in comparing the size of the parts, often a simple bar chart will be easier to read. Source: Andy Kriebel & The Financial Times Press down to learn more. Over time, the donut chart has become increasingly popular. The donut chart is similar to a pie chart but the centre can be a good way of making space to include more information about the data (e.g. total). Source: OCHA Financial Tracking Services Press down for more examples. The proportional stacked bar is another simple way of showing a part-to-whole relationship. This chart is especially good when comparing variables across groups or at different points in time.
Only use maps when precise locations or geographical patterns in data are a crucial part of your message. When the interesting patterns in your data are not geographic patterns, a map may end up distracting from your message. Just because it can be mapped, does not mean it must be mapped. Source: Andy Kriebel & The Financial Times Press down to learn more. This standard approach for putting data on a map, displays geographical areas that are coloured, shaded or patterned in relation to a data variable. It is often better to use rates rather than totals for this type of map. Source: United Nations Global Humanitarian Response Plan for COVID-19 Press down for more examples. This map scales the size of symbols proportionally to the data value found at that location. The larger the symbol, generally the higher the value of something at a location.
The way we communicate about the COVID-19 pandemic has the potential to shape our audience's perception of risk and safety. The design choices we make influence how our audience interprets our underlying data. This is because through our design we choose to emphasise certain aspects of the data while potentially obscuring others. While there are no hard and fast rules for making design choices, the following are a few considerations to think about when creating COVID-19 data visualisations.
Consider the Connotation of Colours Red is a colour that grabs a reader’s attention and is often used to alert and highlight. A map full of overlapping red circles may communicate alarm and, as a result, may distract from other messages. Consider the connotation of colours when creating COVID-19 visualisations.
Plotting exponential growth on a linear scale can obscure important details such as the rate of growth. In the example below, the U.S. and Italy look like they are on similar trajectories on the linear scale whereas it is more apparent on the logarithmic scale that COVID-19 cases are continuing to grow in the U.S. while Italy is seeing a slowdown. Source: Kenneth Chang Press down to learn more. The most common form of a line graph uses a linear scale. Along the Y axis, the numbers progress in a linear form: 1, 2, 3, or 10, 20, 30, etc. On a logarithmic scale, numbers on the Y axis don’t increase in equal increments. Each interval increases by a set factor – often 10 but could be a factor of 3 or 350 or 3,500, etc. A logarithmic scale prevents large numbers from skewing a whole graph and is good for measuring rates of change, particularly rates of growth (e.g. number of cases, deaths, etc. over time). In the case of COVID-19, that is growing exponentially, it will often make sense to use a logarithmic scale because the cumulative case trend line can “keep up” with the numbers instead of shooting off the chart. Read more about logarithmic scales at The Conversation.
Know the Difference between Relative and Absolute Values With the message you want to convey in mind, explore different ways to represent the data you are visualising. When comparing across countries or regions, consider how your visualisation will change based on your choice to use absolute numbers (total number of confirmed cases) versus relative numbers (number of confirmed cases per X thousand people). In their purest sense, an absolute value is a value compared to nothing, and a relative value is created when two absolute values are compared to each other. Relative values are often presented as percentages, proportions, rates, etc., and are good for providing a reference for readers to judge magnitude and to compare magnitude across different categories. The choice to use one or the other depends on the message you are trying to convey.
There is a great deal of uncertainty around COVID-19, and this is present in the data. Try to find ways to convey that uncertainty in your visualisation. For example, adding the simple phrase ‘we know of X cases’ helps convey the critical message that the data we have is incomplete. Source: Lisa Charlotte Rost Press down to learn more
Get to the Point Sometimes charts get in the way of the message you are trying to convey. Consider if it is best to present a simple visual number(s) that you want to emphasise, to get to the point.
Do it yourself It’s time to create your own visualisation. You have learned why we create visualisations, some considerations to make when visualising COVID-19 data and how to use various charts to highlight and communicate different relationships. In this final section, we'll walk through how to create a logarithmic line graph. Included in this is a downloadable step-by-step guide.
Press down to walk through the steps together. Download data: Click on the video below to learn how to find and download data from HDX. Press down for the next step. Prepare & structure your data: Click on the video below to learn how to organise, filter and structure your data in order to create a line graph visualising the growth in COVID-19 cases. Press down for the next step. Insert a line graph: Next, watch the video below to learn how to insert a line graph as well as how to format the date axis in your chart. Press down for the next step. Explore using a logarithmic scale: Watch the video below to learn how to format your Y axis using a logarithmic scale. Press down for the next step. Customise the design: Now it’s time to customise the visualisation! Watch the video below to learn how to add text, change fonts and reduce clutter in your graph.