Beautiful and informative data visualisation
Using ggplot2 to communicate your results
Created by Gergana - last updated 14th May 2019 by Sandra
Tutorial aims and steps:
Аll the files you need to complete this tutorial can be downloaded from this Github repository . Clone and download the repo as a zip file, then unzip it.
1. Good data visualisation and ggplot2 syntax
We’ve learned how to import our datasets in RStudio , and format and manipulate them , and now it’s time we talk about communicating the results of our analyses - data visualisation! When it comes to data visualisation, the package ggplot2 by Hadley Wickham has won over many scientists’ hearts. In this tutorial, we will learn how to make beautiful and informative graphs and how to arrange them in a panel. Before we tackle the ggplot2 syntax, let’s briefly cover what good graphs have in common.
Appropriate plot type for results
Might be a boxplot, a scatterplot, a linear regression fit ... many options
Plot is well organised
The independent (explanatory) variable is on the x and the dependent (respnse) variable is on the y axis
X and Y axes use correct units
Having proper symbols (for alpha, beta, etc.) and super/subscript where needed
X and Y axes easy to read
Beware awkward fonts and tiny letters
It's easy to tell apart what points/lines on the graph represent
Don't put all results on one plot, give them space to shine
Clear and consistent colour scheme
Stick with the same colours for the same variables, avoid red/green combinations which might look the same to colourblind people
Plot is the right dimensions
Avoid overlapping labels and points/lines which merge together and make your graph longer/wider if needed
Measures of uncertainty where appropriate
Error bars, confidence and credible intervals, remember to say in the caption what they are
Concise and informative caption
Remember to include what the data points show (raw data? Model predictions?), what is the sample size for each treatment, the effect size and what measure of uncertainty accompanies it
ggplot2 is a great package to guide you through those steps. The gg in ggplot2 stands for grammar of graphics. Writing the code for your graph is like constructing a sentence made up of different parts that logically follow from one another. In a more visual way, it means adding layers that take care of different elements of the plot. Your plotting workflow will therefore be something like creating an empty plot, adding a layer with your data points, then your measure of uncertainty, the axis labels, and so on.
Just like onions (and ogres!), graphs in ggplot2 have layers.
2. Decide on the right type of plot
A very key part of making any data visualisation is making sure that it is appropriate to your data type (e.g. discrete vs continuous), and fits your purpose, i.e. what you are trying to communicate!
You can start with our simple guide for common graph types, and visit the R Graph Gallery , a fantastic resource for ggplot2 code and inspiration!
Feeling inspired? Let’s make these graphs!
3. Making different plots with ggplot2
Open RStudio, select File/New File/R script and start writing your script with the help of this tutorial.
# Purpose of the script # Your name, date and email # Your working directory, set to the folder you just downloaded from Github, e.g.: setwd("~/Downloads/CC-4-Datavis-master") # Libraries - if you haven't installed them before, run the code install.packages("package_name") library(tidyr) library(dplyr) library(ggplot2) library(readr) library(gridExtra)
We will use data from the Living Planet Index , which you have already downloaded from the Github repository (Click on Clone or Download/Download ZIP and then unzip the files).
# Import data from the Living Planet Index - population trends of vertebrate species from 1970 to 2014 LPI