Logo

The Data Daily

A Footnote in History | R-bloggers

A Footnote in History | R-bloggers

A Footnote in History
Posted on October 31, 2022 by Category R on Quantum Jitter in R bloggers | 0 Comments
[This article was first published on Category R on Quantum Jitter , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here )
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Share Tweet
Producing a journal-quality table requires fine-grained and reproducible control over presentation. Surgical targeting of footnotes, capable of adapting to changes in the underlying data, is one example.
This post briefly explores the shifts in the nature of employment whilst at the same time visiting the grammar of tables gt : The natural companion to the grammar of graphics ggplot2 .
library(tidyverse) library(readxl) library(gt) library(kableExtra)
Tables are rendered across all projects on this site; as a minimum at the end to summarise an auto-generated overview of the R packages and functions used. For these tables, kableExtra has been the go-to solution which requires simply piping the data frame into kbl().
In Digging Deep , the DT package is used to produce a reactable table; one with sortable and searchable columns. DT is intended as an R interface to the DataTables library, but reactivity is not yet supported in gt.
As a guiding principle, RStudio – soon to be Posit – packages are my first port of call. This provides a confidence in cross-package consistency, longevity and an investment in development and support. Hence gt will be the go-to package for the static table further down.
As the intent is to present a summary in the style of the Financial Times, we’ll need a suitable custom colour palette .
theme_set(theme_bw()) cols fct_inorder() tibble(x = 1:6, y = 1) |> ggplot(aes(x, y, fill = cols)) + geom_col() + geom_label(aes(label = cols), nudge_y = -0.1, fill = "white") + annotate( "label", x = 3.5, y = 0.5, label = "Financial Times", fill = "white", alpha = 0.8, size = 6 ) + scale_fill_manual(values = as.character(cols)) + theme_void() + theme(legend.position = "none")
The labour market data are sourced from the Office for National Statistics .
read_data mutate(year = x |> str_remove(".xlsx") |> as.integer()) pop_df map_dfr(read_data)
There’s a hierarchy to the data, so I’ll extract the lowest level and then slice off the top and bottom occupations based on their percentage change over time.
change_df filter(str_starts(occupation, "\\d{4} ")) |> pivot_wider(names_from = year, values_from = persons) |> separate(occupation, into = c("soc", "occupation"), sep = 5) |> mutate(change = `2021` / `2004` - 1) |> arrange(desc(change)) |> mutate(group = if_else(row_number() slice(c(1:10, (n()-10):n())) |> relocate(group)
The handling of footnotes is a particularly nice feature in gt : The package automatically assigns, and maintains the order of, the superscripted numbers (could also be symbols) to ensure they flow naturally. And targeting offers a high degree of control and reproducibility.
For example, two entries in the table below use the abbreviation n.e.c.. The footnote may be targeted at rows which contain that string rather than having to manually identify the rows. And once added, any subsequent footnotes would be renumbered to maintain the flow. So, if I were to change the source datasets to different years or countries, all references to n.e.c. would be automagically found and appropriately footnoted.
gt_tbl gt(rowname_col = c("occupation"), groupname_col = "group") |> tab_header(title = "UK Employment by Occupation") |> fmt_number( columns = starts_with("2"), decimals = 0 ) |> fmt_percent( columns = starts_with("c"), decimals = 0, force_sign = TRUE ) |> sub_missing() |> tab_spanner( label = "Year", columns = starts_with("2") ) |> tab_style( style = cell_text(transform = "capitalize"), locations = cells_column_labels(!starts_with("s")) ) |> tab_style( style = cell_text(transform = "uppercase"), locations = cells_column_labels("soc") ) |> tab_footnote( footnote = "Not elsewhere classified", locations = cells_stub(rows = contains("n.e.c.")) ) |> tab_footnote( footnote = "Count of all persons", locations = cells_column_spanners() ) |> tab_footnote( footnote = "Standard Occupational Classification 2020", locations = cells_column_labels(columns = "soc") ) |> tab_footnote( footnote = "Top & bottom 10 occupations ordered by percent change", locations = cells_row_groups(groups = c("Risers", "Fallers")) ) |> tab_footnote( footnote = "Figures suppressed as statistically unreliable", locations = cells_body( columns = c(change, `2021`), rows = is.na(change) ) ) |> tab_source_note(source_note = "Source: Office for National Statistics (ONS)") gt_tbl |> opt_stylize(style = 6, color = "gray", add_row_striping = TRUE) |> gtsave("styled.png")
The above table uses one of the in-built style theme options. It looks clean and polished. But sometimes the table to be published needs a high degree of customisation to match, for example, a specific branding. gt offers this as we’ll demonstrate by attempting to replicate the style employed by the market data in the Financial Times .
gt_ft tab_options( table.border.top.color = "#FFF1E5", table.border.bottom.color = "#FFF1E5", table.background.color = "#FFF1E5", table.font.size = 8, table.font.color = "#262A33", row.striping.include_table_body = TRUE, row.striping.include_stub = TRUE, row.striping.background_color = "#F2DFCE", heading.background.color = "#FFF1E5", row_group.background.color = "#FFF1E5" ) |> opt_vertical_padding(scale = 1.3) |> tab_header(title = html("UK Employment by Occupation ", local_image("logo.png", height = 15))) |> tab_style( style = list( cell_text(font = "Financier Display", size = px(15), align = "left"), cell_borders(sides = "bottom", weight = px(3), color = "#262A33") ), locations = cells_title() ) |> tab_style( style = cell_text(size = 14), locations = cells_row_groups() ) |> tab_style( style = cell_text(color = "#800D33", weight = "bold"), locations = cells_stub() ) |> tab_style( style = cell_text(weight = "bold"), locations = list(cells_column_labels(), cells_column_spanners(), cells_row_groups(), cells_title()) ) |> tab_style( style = cell_borders(style = "hidden"), locations = list(cells_body(), cells_row_groups(), cells_stub()) ) |> tab_style( style = cell_text(color = "#00994D", weight = "bold"), locations = cells_body( columns = change, rows = change >= 0 ) ) |> tab_style( style = cell_text(color = "#C00000", weight = "bold"), locations = cells_body( columns = change, rows = change < 0 ) ) |> tab_style( style = cell_text(color = "grey40", size = px(6)), locations = list(cells_footnotes(), cells_source_notes()) ) gt_ft |> gtsave("ft.png", zoom = 5)
R Toolbox
Summarising below the packages and functions used in this post enables me to separately create a toolbox visualisation summarising the usage of packages and functions across all posts.
Package

Images Powered by Shutterstock