Logo

The Data Daily

CRAN Task View: Sports Analytics

CRAN Task View: Sports Analytics

CRAN Task View: Sports Analytics
Maintainer:
Benjamin S. Baumer, Quang Nguyen, Gregory J. Matthews
Contact:
https://github.com/cran-task-views/SportsAnalytics/
Contributions:
Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide .
Citation:
Benjamin S. Baumer, Quang Nguyen, Gregory J. Matthews (2022). CRAN Task View: Sports Analytics. Version 2022-05-11. URL https://CRAN.R-project.org/view=SportsAnalytics.
Installation:
The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("SportsAnalytics", coreOnly = TRUE) installs all the core packages or ctv::update.views("SportsAnalytics") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details.
This CRAN Task View contains a list of packages useful for sports analytics. Most of the packages are sport-specific and are grouped as such. However, we also include a General section for packages that provide ancillary functionality relevant to sports analytics (e.g., team-themed color palettes), and a Modeling section for packages useful for statistical modeling. Throughout the task view, and collected in the Related links section at the end, we have included a list of selected books and articles that use some of these packages in substantive ways. Our goal in compiling this list is to help researchers find the tools they need to complete their work in R.
To be considered for inclusion, the package must be useful for conducting sports analytics. Most packages provide functionality for some combination of:
acquiring data for a specific sport or league
performing common computations on sport-specific data
Esports and sports betting packages are within scope.
The list of packages is aspirationally comprehensive. If there is a sports analytics package on CRAN that we have missed, please let us know. Contributions are always welcome, and encouraged – please see the linked GitHub repository for details.
General
teamcolors provides color palettes, ggplot2 themes, xaringan themes, and logos for professional teams across a variety of sports and leagues. teamcolors was originally designed to create the data graphics in Lopez, et al. (2018) ( doi:10.1214/18-AOAS1165 ).
colorr contains color palettes for professional sports teams in the EPL, MLB, NBA, NHL, and NFL.
nbapalettes contains color palettes inspired by NBA team jersey colors.
sportyR contains functions for creating ggplot2 representations of sports playing surfaces pursuant to rule-book specifications. This is particularly useful for plotting player tracking data.
SportsTour provides functions for displaying tournament fixtures using knock-out and round robin methods.
TouRnament consists of two functions: 1) Creating league tables based on results and 2) Creating a match schedule for a league.
Sport-Specific Packages
American Football ????
nflverse is a collection of packages for obtaining and analyzing NFL data. The core nflverse includes nflfastR , nflseedR , nfl4th , nflreadr , and nflplotR .
nflfastR contains functions to efficiently scrape NFL play-by-play data from 1999 to present. It is similar to nflscrapR , but much faster. All models required by nflfastR are hosted in fastrmodels .
nflreadr efficiently downloads data from GitHub repositories of the nflverse project, including pre-computed nflfastR data frames.
nfl4th consists of functions to calculate optimal Fourth Down decisions in the National Football League. Data on 4th downs is collected from NFL and ESPN .
nflseedR contains functions for ranking NFL teams based on the complex NFL tie breaking rules. It includes division ranking, playoff seeding, and draft order.
nflplotR includes functions for making NFL data visualization in ggplot2 easier.
NFLSimulatoR consists of tools for simulating plays and drives, and furthermore evaluating in-game strategies in the NFL.
fflr provides functions to access ESPN raw fantasy football data from the ESPN fantasy football API and formatting the raw data.
ffscrapr helps access various fantasy football APIs including MFL, Sleeper, ESPN, and Fleaflicker with a consistent interface and built-in authentication, rate-limiting, and caching.
ffsimulator allows users to simulate fantasy football seasons using bootstrap resampling. Simulations are based on historical rankings and data from the package nflfastR . In addition, functions for computing optimal lineups and aggregating results are provided.
gsisdecoder contains functions to decode NFL Player IDs for use in conjunction with the nflfastR package.
cfbfastR provides function for accessing college football play-by-play data from collegefootballdata.com .
Association Football (Soccer) ⚽
European soccer data is available through the engsoccerdata package, which contains match results for English and other European soccer leagues dating back to 1871.
socceR provides functions for evaluating soccer predictions and simulating results from soccer matches and tournament.
fbRanks helps with estimating team strengths and rankings using time dependent Poisson regression and data on number of goals scored.
ggsoccer provides functions for visualizing soccer event data in ggplot2.
qqr is a collection of Brazilian Soccer Championship data on match statistics since 2014.
footballpenaltiesBL contains data and plotting functions for analyzing penalty kicks in the German Men’s Bundesliga from 1963-64 to 2016-17.
footBayes consists of functions for fitting widely known soccer models (double Poisson, bivariate Poisson, Skellam, Student’s t) through Hamiltonian Monte Carlo and Maximum Likelihood estimation approaches using Stan. The package also provides tools for visualizing team strengths and predicting match outcomes.
itscalledsoccer enables access to American soccer (MLS, NWSL, and USL) data through the American Soccer Analysis app API .
Australian Rules Football ????
fitzRoy is a package for scraping and processing Australian Football League (AFL) data. fitzRoy provides access to publicly data sources such as AFL Tables , Footy Wire , and The Squiggle .
Baseball ⚾
Historical baseball data is available through the Lahman package, which contains season-level data for Major League Baseball going back to 1871.
retrosheet facilitates downloading game log, team IDs, rosters, and play-by-play and other files from Retrosheet.org , and returning the results as data frames. Local caching can be employed to improve efficiency. Note that the play-by-play data returned comes directly from the event files and is not parsed (i.e., Chadwick is not bundled).
pitchRx provides access to pitch-level data through the Major League Baseball Advanced Media API. The package is featured prominently in Marchi, M., Albert, J., and Baumer, B. S. (2018). Analyzing baseball data with R ( doi:10.1201/9781351107099 ). For a full description of the package see Sievert, C. (2014). Taming PITCHf/x Data with XML2R and pitchRx ( doi:10.32614/RJ-2014-001 ).
mlbstats provides functions for vector-based computation of many baseball statistics, both traditional and sabermetric.
baseballDBR leverages the backend database functionality of dplyr to build local databases that mirror the data contained in Lahman . Like mlbstats , it also includes functions to compute baseball statistics, but on data frames rather than vectors.
baseballr consists of functions for extracting and analyzing baseball data from various sources such as Baseball Reference , FanGraphs , and Baseball Savant .
Basketball ????
BAwiR is a collection of tools to analyze basketball data, with focus on data scraping and visualization.
AdvancedBasketballStats provides functions to calculate and analyze basketball statistics for players, teams, lineups (quintets), and plays.
uncmbb contains data on University of North Carolina (at Chapel Hill) Men’s Basketball Results since the 1949-50 season.
hoopR consists of functions for accessing men’s college basketball and NBA data from various sources, including ESPN , NBA Stats API , and Ken Pomeroy’s college basketball ratings .
Chess ♟
chess is an opinionated wrapper for R around python-chess . It reads and writes PGN files and SVGs of game boards.
stockfish implements the UCI open communication protocol and ships with Stockfish , a popular, open source, powerful chess engine written in C++.
Like chess , bigchess reads and writes PGN files. And like stockfish , bigchess provides an API to the UCI chess engines. bigchess is also able to read multiple game files at once without copying to RAM.
rchess provides functions for chess validations, pieces movements, check detection, and plotting chess boards.
chessR allows users to obtain game data from online chess applications, including chess.com and Lichess .
Cricket ????
yorkr provides functions for analyzing statistics of cricket players and teams based on Cricsheet data.
cricketr is a collection of tools for analyzing cricket performances of players and teams based on ESPN Cricinfo Statsguru data.
cricketdata includes functions to obtain international cricket data from two major sources, ESPNCricinfo and Cricsheet .
Esports ????
RDota2 contains functions for retreiving data for the video game Dota2 from the Steam API .
GPS Tracking ????
trackeR and trackeRapp provide tools for analyzing running, cycling and swimming data from GPS-enabled tracking devices within R. These two packages allow users to tidy and explore data from workouts and competitions.
rStrava contains functions to access Strava activity data from the Strava API .
A detailed overview of tools for processing and analyzing tracking data can be found in the Tracking CRAN Task View.
Hockey ????
NHLData contains scores from NHL games dating back to 1917. Data are stored one season at a time and contains scores for every game during a particular season.
Access to data exposed by the NHL API is provided by the nhlapi and nhlscrape packages.
fastRhockey provides API wrappers for the NHL and Premier Hockey Federation (PHF), formerly known as the National Women’s Hockey League (NWHL).
Racket Sports ????????????
squashinformr consists of functions for retrieving data on the Professional Squash Association World Tour from SquashInfo .
Softball ????
runexp provides methods for estimating runs scored in softball. In particular, runexp centers around theoretical expectation using discrete Markov chains and empirical distribution using multinomial random simulation.
Swimming ????
SwimmeR reads swimming results in a variety of formats and returns results in tidy data frame. It also includes functions for converting times between short-course yards (SCY), short-course meters (SCM), and long-course meters (LCM).
Track and Field ????
combinedevents contains functions for calculating scores and marks for combined events competitions in track and field, based on the International Association of Athletics Federation scoring tables.
JumpeR consists of functions for importing (primarily) and analyzing track and field data.
Volleyball ????
volleystat contains match statistics from the German Volleyball Bundesliga from 2013-14 to 2018-19. Data were extracted from the league homepage .
Modeling
A wide array of functions for modeling in sports analytics are available in the R base package (e.g. lm() and glm()). In addition, other CRAN Task Views such as Bayesian , MachineLearning , Robust , Spatial , and SpatioTemporal may contain appropriate packages for applying statistical methods to sports.
Betting
odds.converter contributes functions for converting common sports betting odds types, including US odds, Hong Kong odds, Decimal odds, Indonesian odds, Malaysian odds, and raw probability.
implied is a collection of functions that convert between bookmaker odds and probabilities, based on various algorithms.
pinnacle.data contains Pinnacle market odds, highlighted by a dataset of all wagering lines for the 2016 MLB season.
RKelly computes the Kelly criterion for betting and provides functions to calculate outcome probabilities for multi-leg contests.
Ratings
BradleyTerry2 provides functions and examples for fitting Bradley-Terry models ( doi:10.2307/2334029 ) to paired comparison data. Packages BSBT (Bayesian Spatial Bradley-Terry) and BTdecayLasso (Bradley-Terry Model with Exponential Time Decayed Log-Likelihood and Adaptive Lasso) provides implementations to extended versions of the Bradley-Terry model. See doi:10.18637/jss.v012.i01 for background on the predecessor package.
Methods for estimating the Elo rating in sports can be found in the elo , welo , EloOptimized , EloChoice , and EloRating packages. PlayerRatings also offers implementations to other rating systems, including Glicko, Glicko-2, and Stephenson, in addition to Elo.
piratings computes pi-ratings for determining team ability in association football, as described in Constantinou and Fenton (2013) ( doi:10.1016/j.knosys.2013.05.008 ).
mvglmmRank provides functions for building multivariate generalized mixed models for ranking teams in sports.
CRAN packages

Images Powered by Shutterstock