How to deal with annoying medium sized data inside a Shiny app
Posted on October 30, 2022 by Econometrics and Free Software in R bloggers | 0 Comments
[This article was first published on Econometrics and Free Software , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here )
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Share Tweet
This blog post is taken from a chapter of my ebook on building reproducible analytical pipelines, which you can read here
If you want to follow along, you can start by downloading the data I use here . This is a smaller dataset made from the one you can get here .
Uncompressed it’ll be a 2.4GB file. Not big data in any sense, but big enough to be annoying to handle without the use of some optimization strategies (I’ve seen such data described as medium sized data before.).
One such strategy is only letting the computations run once the user gives the green light by clicking on an action button. The next obvious strategy is to use packages that are optimized for speed. It turns out that the functions we have seen until now (note from the author: the functions we have seen until now if you’re on of my students that’s sitting in the course where I teach this), from packages like {dplyr} and the like, are not the fastest. Their ease of use and expressiveness come at a speed cost. So we will need to switch to something faster. We will do the same to read in the data.
This faster solution is the {arrow} package, which is an interface to the Arrow software developed by Apache .
The final strategy is to enable caching in the app.
So first, install the {arrow} package by running install.packages("arrow"). This will compile libarrow from source on Linux and might take some time, so perhaps go grab a coffee. One other operating systems, I guess that a binary version gets installed.
Before building the app, let me perform a very simple benchmark. The script below reads in the data, then performs some aggregations. This is done using standard {tidyverse} functions, but also using {arrow}:
start_tidy