Share Tweet
How do I count thee? Let me count the ways? by Jerry Tuttle In Major League Baseball, a player who hits 50 home runs in a single season has hit a lot of home runs. Suppose I want to count the number of 50 homer seasons by team, and also the number of 50 homer seasons by New York Yankees. (I will count Maris and Mantle in 1961 as two.) Here is the data including Aaron Judge’s 62 in 2022 :
You would think base R would have a count function such as count(df$Team) and count(df$Team == “NYY”) but this gives the error “could not find function ‘count’”. Base R does not have a count function. Base R has at last four ways to perform a count: 1. The table function will count items in a vector. table(df$Team) presents results horizontally, and data.frame(table(df$Team)) presents results vertically. table(df$Team == “NYY”) displays results 37 false and 10 true, while table(df$Team == “NYY”)[2] just displays the result 10 true. 2. The sum function can be used to count the number of rows meeting a condition. sum(df$Team == “NYY”) displays the result 10. Here df$Team == “NYY” is creating a logical vector, and sum is summing the number of true = 1. 3. Similar to sum, nrow(df[df$Team == “NYY”, ]) counts the number of rows meeting the NYY condition. 4. The length function counts the number of elements in an R object. length(which(df$Team == “NYY”)) , length(df$Team[df$Team == “NYY”]) , and length(grep(“NYY”, df[ , “Team”])) are all ways that will count the 10 Yankees. The more direct solution to counting uses the count function in the dplyr library. Note that dplyr’s count function applies to a data frame or tibble, but not to a vector. After loading library(dplyr) , 1. df %>% count(Team) lists the count for each team. 2. df %>% filter(Team = “NYY”) lists each Yankee, and you can see there are 10. 3. df %>% count(Team == “NYY”) displays 37 false and 10 true, while df %>% filter(Team == “NYY”) %>% count() just displays the 10 true. The following is a bar chart of the results by team for teams with at least 1 50 homer season:
Finally, “How do I count thee? Let me count the ways?” is of course adapted from Elizabeth Barrett Browning’s poem “How do I love thee? Let me count the ways?” But in her poem, just how would we count the number of times “love” is mentioned? The tidytext library makes counting words fairly easy, and the answer is ten, the same number of 50 homer Yankee seasons. Coincidence? The following is all the R code. Happy counting!
library(dplyr) library(ggplot2) library(tidytext) df % count(Team) # lists the count for each team. df %>% filter(Team == "NYY") # lists each Yankee, and you can see there are 10. df %>% count(Team == "NYY") # displays 37 false and 10 true, while df %>% filter(Team == "NYY") %>% count() # just displays the 10 true. # barplot of all teams with at least 1 50 homer season; remember to load library(ggplot2) df %>% group_by(Team) %>% summarise(count = n()) %>% ggplot(aes(x=reorder(Team, count), y=count, fill=Team)) + geom_bar(stat = 'identity') + ggtitle("Count of 50 Homer Seasons") + xlab("Team") + scale_y_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10)) + coord_flip() + theme(plot.title = element_text(face="bold", size=18)) + theme(axis.title.y = element_text(face="bold")) + theme(axis.title.x = element_blank()) + theme(axis.text.x = element_text(size=12, face="bold"), axis.text.y = element_text(size=12, face="bold")) + theme(legend.position="none") # count number of times "love" is mentioned in Browning's poem; remember to load library(tidytext) textfile % head(6) cleaned_words %>% filter(word == "love") %>% count()