Logo

The Data Daily

Analyzing Professional Sports Team Colors with R

Analyzing Professional Sports Team Colors with R

When working with the ggplot2 package, I often find myself playing around with colors for longer than I probably should be. I think that this is because I know that the right color scheme can greatly enhance the information that a plot portrays; and, conversely, choosing an uncomplimentary palette can suppress the message of an otherwise good visualization.

With that said, I wanted to take a look at the presence of colors in the sports realm. I think some fun insight can be had from an exploration of colors used by individual sports teams. Some people have done some interesting technical research on this topic, such as studying the possible effects of color on fan and player perception of teams.

I show code only where I believe it complements the commentary throughout; otherwise, it is hidden. Nonetheless, the underlying code can be viewed in the raw .Rmd file for this write-up.

Although I list all of the packages used in this write-up (for the sake of reproducibility), I comment out those that are used only in an explicit manner (i.e. via the “package::function” syntax). (Only dplyr and ggplot2 are imported altogether). Minimizing the namespace in this manner is a personal convention.

The data that I’ll use comes from the teamcolors R package, which itself is sourced from Jim Nielsen’s website for team colors. This data set provides color information for all teams from six professional sports leagues:

To begin, here’s visualization of all the colors in this data set. Not much significance can be extracted from this plot, but it’s still nice to have as a mechanism for getting familiar with the data.

Note that there are quite a few teams without a full set of four colors (and some without a third or even second color). Both the visualization and the tabulation indicate that the MLB is missing the most colors (on a per-team basis). Perhaps this suggests that it is the most “dull” sports league. The NFL is on the other end of the spectrum (pun intended), with only 1.5% of missing color values. Is it a coincidence that the NFL is the most popular sport in the U.S.? My subjective indictment of MLB as dull is certainly unfair and unquantitative. Does “dull” refer to hue, lightness, brightness, etc.? For the sake of argument, let’s say that I want to interpret dullness as “brightness”, which, in the color lexicon, is interpreted as the arithmetic mean of the red-green-blue (RGB) values of a color. To rank the leagues by brightness, I can take the average of the RGB values (derived from the hex values) across all colors for all teams in each league. The resulting values–where a lower value indicates a darker color, and a higher value indicates a brighter color–provide a fair measure upon which each league’s aggregate color choices can be judged. This calculation proves what we might have guessed by inspection–the NHL actually has the darkest colors. In fact, it seems that the NHL’s “darkness” is most prominent in the primary colors of the teams in the league. On the other hand, the NBA and the two soccer leagues (the MLS and the EPL) stand out as the leagues with the “brightest” colors. Finally, just by inspection, it seems like their is an unusual pattern where a disproportionate number of teams in the MLS, NBA, and NFL have shades of gray as their tertiary colors. Using the same function as before, it can be shown indirectly via relatively small standard deviation values that there is not much variation in this color.

Using a slightly customized version of the function, I can attempt to identify common colors (by name) from the hex values. I’ll bin the possible colors into a predefined set. (If a binning strategy is not implemented, one ends up with a more sparse, less meaningful grouping of colors.) This set consists of the “rainbow” colors, as well as black, white, and two shades of grey. Now, with the setup out of the way, I can easily compute the names of each color and identify the most common colors overall, as well as the most common primary and secondary colors. add_color_nm_col %
identify_color_name(set = colors_rnbw) %>%
tibble::as_tibble() %>%
bind_cols(data, .)

if(rename) {
out %
rename(color_nm = value)
}
out
}

colors_named %
add_color_nm_col() Of course, a visualization is always appreciated. ords % pull(ord)
color_nm_na %
select(-hex, -league) %>%
tidyr::complete(name, ord, fill = list(color_nm = color_nm_na)) %>%
tidyr::spread(ord, color_nm)

colors_named_compl_ord2 %
filter(primary != secondary) %>%
count(primary, secondary, sort = TRUE) %>%
filter(primary != "none") %>%
filter(secondary != "none")

colors_named_compl_ord2_ig %
igraph::graph_from_data_frame()

igraph::V(colors_named_compl_ord2_ig)$node_label

Images Powered by Shutterstock