what is a treemap? — storytelling with data

Read original article here

by Simon Rowe
This article is part of our back-to-basics blog series called what is…?, where we’ll break down some common topics and questions posed to us. We’ve covered much of the content in previous posts, so this series allows us to bring together many disparate resources, creating a single source for your learning. We believe it’s important to take an occasional pulse on foundational knowledge, regardless of where you are in your learning journey. The success of many visualizations is dependent on a solid understanding of basic concepts. So whether you’re learning this for the first time, reading to reinforce core principles, or looking for resources to share with others—like our new comprehensive chart guide —please join us as we revisit and embrace the basics.
What is a treemap?
In the early 1990s, Dr Ben Shneiderman was trying to solve the challenge of finding a way to adequately evaluate the vast file directory on a computer. At that time, existing visualisation methods weren’t suitable due to the potentially complex network of folder structures.
The folders contained in this image contain just 0.05% of the total hard drive capacity!
Dr Shneiderman developed the “treemap” in order to visualise this large amount of data—with multiple levels of folders and subfolders—in an efficient way, without taking up too much screen real estate. The treemap uses a series of nested rectangles, sized proportionally to the corresponding data value, to deliver an organised and multi-level view into any hierarchical data set.
The hierarchy aspect comes when the initial, larger rectangles (branches) are then tiled with smaller rectangles (sub-branches) representing the next level down in the hierarchy. In the examples below, step 1 shows the overall units sold before sales are broken down by city (step 2) and finally, by city and product (step 3).
Let’s take a look at another example.
This treemap is visualising the total transfer expenditure from the football (soccer) clubs representing the top five leagues in Europe. The size of the rectangles represents the total value in £GBP of purchased players and colour is applied categorically to each of the five leagues.
Sorting
The rectangles in a treemap are sorted by size, starting with the largest in the top left corner, progressing down to the smallest in the bottom right. When multiple hierarchical levels are displayed this order is repeated for each of the nested rectangles.
In this example, the first level of the hierarchy is “league,” and so the largest league (the English Premier League) is in the top left corner, and the smallest (the French Ligue 1) is in the bottom right. The second level of the hierarchy is “club;” so, within each league, the individual clubs are sorted with the largest at the top left (Chelsea, Juventus, Barcelona, Bayern Munich, and PSG) and the smallest at the bottom right.
Sizing
The rectangles can be sized and ordered using a number of different algorithms. We won’t cover the nuances of these algorithms in this post—often, the built-in treemap tools in our software applications are a bit of a black box—but Excel and Tableau, among other popular programs, use the “squarified” algorithm, which attempts to render each rectangle as much like a square as the data and overall layout allow.
It is important to note that the size of each rectangle is determined by the data itself, and a treemap cannot be used to show zero or negative-value data points.
Colour
As in the example above, colour can be used categorically, to emphasise the borders between the top level rectangles. Alternatively, colour can be applied in heatmap-fashion to denote quantitative values.
Interactivity
By their very nature, treemaps pose a challenge when it comes to labelling smaller rectangles in the visual, resulting in a number of unnamed segments. For this reason, it’s more common to see treemaps used interactively, where a user can click or hover over a segment to display those values, and potentially discover more detailed information.
When would you use a treemap?
Treemaps can be suitable in a number of situations, depending on your data, your available space, and the relationships you need to explore or emphasize with your visual.
You want to visualise a part-to-whole relationship among a large number of categories.
While a pie chart might best work with three or fewer segments, a treemap works well with many. Patterns in the data are easier to see, making treemaps a good choice to use at the exploration stage of analysis. (We’ll look at an example of this shortly.)
Precise comparisons between categories are not important.
Treemaps provide an excellent high-level view of categories, and of subcategories that appear similar or markedly different. Efficient use of colour will aid these comparisons. You may lose some precision due to the number of rectangles, the reliance on comparing values by area, and a loss of legible labels, but in the exploratory phase of your analysis, that may be a worthwhile trade-off.
You need to prioritise the efficient use of space.
The ability to display thousands of items within a small amount of screen real estate is a benefit of the treemap. The only limitation is the number of rectangles that can be legibly labelled.
Your data is hierarchical.
You’d be hard-pressed to find a more elegant or efficient way to visualise data at multiple hierarchical levels than the treemap. That does stand to reason, as the need to accomplish just such a task was the impetus for its creation in the first place.
Turning our attention back to the example, we can make some quick observations.
The English Premier League (in dark blue) represents about ⅓ of the total spending across these five leagues
France (orange) and Germany (gray) have spent far less than England, Italy and Spain
The top four spending teams in Italy (yellow) represent about 50% of the total spend in Serie A
Bayern Munich (towards the top right) have spent around half the total of Chelsea (in the top left)
We are making general observations here, which is what treemaps allow us to do nicely. We could add labels (where space allows) to display the value spent as well.
What are the challenges of using treemaps?
Your audience may be unfamiliar with them.
A typical issue with more novel chart types like the treemap is that while they are visually appealing, they may require us to do some explanation regarding how they’re meant to be read. In many cases, another chart type would display the data more effectively, especially if we are able to thoughtfully aggregate or edit the data we choose to share.
It is difficult to make precise comparisons of areas.
We are exceptionally efficient at making visual comparisons based on length and height, or positions of dots; whereas, we can’t as accurately assess area and intensity of colour. If the data shows no significant differences, thereby filling our treemap with rectangles of similar but not identical sizes, this will make those comparisons especially difficult. Always consider whether a bar chart or line graph might achieve audience understanding quicker.
They can be overwhelming to process.
While the ability to visualise thousands of items across multiple hierarchical levels is a supporting point of treemaps, this sheer quantity of information can become overwhelming to audiences. If the majority of your rectangles are not labelled or the different levels of the hierarchy are indiscernible, then consider a different way to visualise the data.
They can only be used for data sets in which every value is greater than zero.
The value for sizing the rectangle cannot have a negative value. This limits potential use cases, you can’t visualise profit and loss, for example.
Some of these challenges become apparent when we ask more nuanced questions of our example.
Which league has spent more: Germany, or France?
Have England’s top six teams spent more than Italy’s?
What proportion of spend do the top six teams in each league represent?
How much have Nottingham Forest spent?
OK, the last question is a little unfair. The rectangle for Nottingham Forest isn’t even labelled, but that in itself is a challenge that has been referred to above. Notice how these questions are substantially harder to answer, as they require a more precise comparison than this treemap can provide.
Alternatives to treemaps
In many cases, treemaps can be replaced with bar charts (for data that has one quantitative and one categorical variable) or scatter plots (for data with two quantitative variables).
Let’s reimagine our example by using two bar charts to tell the story of the overall spending between the leagues, and how that balances between the top 6 spending teams and the rest. While these insights were found by exploring the treemap, they are arguably better explained using the bars below.
As with every chart type, ensure the treemap is selected for the right reasons. That is to say, ensure the visual fits the story and data supporting it, and don’t attempt to force the data into a sub-optimal chart selection or make this chart selection for personal or aesthetic reasons.
Where can I learn more about treemaps?
Read about the technical details of treemap research from the University of Maryland

Images Powered by Shutterstock

The Data Daily

what is a treemap? — storytelling with data