Depending on your age, the song title ‘Word Up’ will either remind you of a bloke in a red codpiece, a Scottish rock band with a left field cover version, or possibly, Little Mix, (according to Google).
However, I urge you to put down your codpieces (yes, even you at the back) and instead gather round for tales of woe working with Microsoft Word.
Here’s the background: I’m currently working on a project where I am using parameterised rmarkdown for the first time – hooray! This will end up as 13 separate documents of about 30 pages, with lots of tables and charts. This is the first phase of many similar documents.
But, I have to render my final output to Word, which, I’m not going to lie, was a bit of a worry. That’s a lot of Word, and a lot of scope for things to go wrong.
While I’m not adverse to Excel, I am far from a fan of Word. Much of this is because of its infuriating layout quirks when it comes to margins,figures and tables.
The {officer}, {officedown} and {flextable} packages make working with Word easier. You can use {officer} to create Word and Powerpoint documents when working in R, {officedown} to do the same in R Markdown, and {flextable} for, well, tables.
The advantage of {officedown} is it allows dynamic figure and table captions, cross referencing, and whole host of other features ( like multi column layouts, changing page orientation, and different formats for different sections) {officer} is a lower level package, where you work with adding paragraphs, and other blocks. One of my favourite features is which allows you to insert an entire page, or even an entire document, into your rmarkdown file, as a one-liner.
{flextable} is not just for Word, and ranks alongside {gt} as a general purpose solution for all your table needs. If you haven’t considered it before now, it is definitely worth spending some time on.
There’s not a lot of documentation / help/ blog posts out there for working with these packages, especially for Word.
Alison Hill wrote a post detailing her friction log when working with Powerpoint, so consider this my friction log of working with Word.
(These are in no particular order)
You will see advice that says you need to knit a regular Word document ( from the New File menu item in RStudio), then open it (it doesn’t need any content) and then amend the styles in this document, before passing it to officer as a reference document.
My colleagues already had a .dot template with our departmental styles applied. I found that I could use that (after I’d saved a copy as a .docx file) and the styles were picked up by {officedown}. No need to generate a blank file and amend everything from scratch (there are a lot of style elements and it would have been really tedious). If you already have a document with styles in place, you should be good to go. Here’s how my Rmd file looked. You specify as the output format, and I saved my template file in the root directory so it was easy to reference :
Throughout this project I’m recreating an existing suite of tables and plots that have, until now, been produced in Excel & Word. The first table had totals at Scotland, Council and Health Board level, then a space, then a range of lower level data.
Once I’d got the data into shape, I first added the space using .
But, over time, this felt wrong – I didn’t like the idea of adding an unnecessary empty row to a dataframe, especially if I might need it for further analysis later on.
The solution, instead, was to add to the relevant row(s)
Here I’m adding padding to rows 1 and 9 of my table ( flextable uses the and conventions for applying functions and modifications to rows and columns), and ensuring the padding goes at the bottom by setting to .
Another way to improve presentation in some of my tables was to reduce row heights for some rows, and increase it for others.
Row 4 gets less space, while the other row heights are increased. N.B – I thought I could do something like
but it didn’t seem to like it, so each row got entered individually instead.
Here I amend the 3rd, 4th and 5th columns – because the 3rd column name is dynamic, I didn’t want to refer to it by name
For other tables, where I know the column names are static, I can enter the names directly:
I needed to add header row above the final two columns in my seven column table, denoting whether there was a statistically significant difference between them and a reference value.
The first 5 columns get passed nothing, while “Significance” spans the final two columns.
Using , we can add a custom background colour to highlight a specific row.
I found that when creating the table in a function, I had to to use the syntax below:
Prior to this, I was using the syntax below, but it would not work inside a function.
Easy enough, give the column name you want to amend and pass it
Ok, technically not an {officer} / {officedown} thing, but here it is anyway. You’ll see I’m specifying the figure captions here, also
If this sounds crazy / stupid, then here’s the scenario. This particular table looks at deprivation. Not all areas have datazones within a certain deprivation decile, hence, there will not always be data to show. However, I want the table numbers to be consistent across all the documents – table 10 in doc1 should address the same topic as table 10 in doc 2 (instead of showing table 11 data under a table10 heading).
I went belt and braces here, using and
This returns an empty table for the edge case where currently no zones are in the specified decile.
Here’s a strange one. When knitting docs individually, the resulting Word file opened and the table of contents was displayed. But, when rendering all of the docs in a loop using purrr, there was no TOC, and instead, I was askedif I’d like to update external links within the document. If I said ‘no’ , then the TOC did not update. If I said ‘yes’, then it would update, and appear as intended. I’m not sure why I only saw this when rendering with , but there doesn’t appear to be a way round it. This means, when the final documents are distributed, we will have to advise readers to accept any prompts to update external links, which always feels a bit uncomfortable.
TLDR – lots of my variables had within their names. In some datasets, this was changed to the ampersand. For example became
I need to unify these, and let’s face it, typing “&” is easier, and takes up less space.
Big mistake. All seemed to be going well, until I tried generating a report with as the parameter. I’m sure someone is laughing at this, thinking, “well of course you can’t have “&”, that’s a special character in html / xml”, but I didn’t know that, being as I’m not an actual super nerd. (Well, OK, I am, but even I have my limits).
Debugging this took me a long time. My first thought was I’d messed up my functions, and then I got some great advice on Twitter, and realised that the markdown file was rendering 100% completely and the failure point was between the .md file and the final Word output.
Reverting all ampersands back to ( which mean re-running the entire pipeline from scratch) allowed me to generate the elusive final document, and a lesson learned.
I like ‘&’. I like ‘and’. But which is better? There’s only one way to find out. FIGHHTTTT!
Depends a lot on your margins, but instead of relying on , this seems to work well :
I’m going to leave that for the next post, which I hope will be published within the next few days..so please look out for the follow up post!