Logo

The Data Daily

How To Read This Chart: You can't Judge a season by its homers

How To Read This Chart: You can't Judge a season by its homers

 
I do not like the New York Yankees.
I'm not going to get into a big debate about this. I know people like them and I know that they have a long history and blah blah blah. My distaste for them is like my distaste for mushrooms: I find both unpleasant and I find insistences that I'm missing out on something good irritating. The texture is bad, the taste is bad, the history of how they got in front of me is unpleasant. I will not be deterred.
I share that, though, because it is important context for the primary topic of discussion today, which is a guy named Aaron Judge.
Aaron Judge plays for the Yankees and he hit a lot of homers this year . You probably heard about this. (If you didn't, I'd say that's a significant strike against my “data people tend to like baseball” thesis.) But that success is mired in multiple debates over what constitutes a single-season home run record. And those debates, it turns out, can be informed through the use of DATA VISUALIZATION.
I pulled data from the always-great site baseball-reference.com . What's shown below is the number of home runs hit over the course of each of the 11 seasons in which players hit the most home runs.
You can see Judge's 2022 season in black. That line traces the cumulative number of home runs he hit this year as each of the 162 games unfolded. In games where he hit a home run, the line ticks up. If he didn't hit a home run, the cumulative total stays the same and the line just shifts to the right. What's shown, then, isn't just the total number of home runs but the pace at which they were hit.
ADVERTISEMENT
 
I've also color-coded the seasons, breaking them out into three groups: seasons played before 1990, seasons from 1990 to 2010 and seasons since. Of the 11 seasons at issue, two happened since 2010 and three before 1990. The other six fell between 1990 and 2010 — or, more specifically, between 1998 and 2001.
This is known colloquially as the steroid era.
Those six seasons, in which Barry Bonds, Sammy Sosa and Mark McGwire all jockeyed to set the single-season record for home runs also make up the six seasons with the most home runs that were hit. Bonds and McGwire both later admitted to using performance-enhancing drugs.
What is so striking about this version of the chart is that line representing Bonds's record-setting season. That flat part in the middle is a period of more than a dozen games in which he didn't hit any home runs at all — nearly a tenth of the season! Yet he still set the record. And his trajectory kept creeping higher late in the season, even as pitchers simply started not throwing him any pitches to hit. Only two major league seasons have seen a player walked more times than Bonds was in 2001: Bonds in 2002 and Bonds in 2004.
But you can see why a lot of Yankee fans think these records shouldn't count. Setting aside McGwire and Sosa, if Bonds hadn't used steroids in 2001, would he have surpassed the record set by Roger Maris in 1961 of 61 home runs? (Skeptics might ask whether Yankee fans would object to Bonds's record if he'd been wearing a Yankee uniform instead of that of the San Francisco Giants when it happened, but I would never be so crass.)
If we crop those home runs out, we get New York Yankee Aaron Judge at the top. He wasn't always on pace to beat Maris in 1961 or Babe Ruth's pace from 1927. (I was unable to determine the teams for which Maris and Ruth played; it's a mystery.) But you can see that surge just before the 100-game mark that skyrocketed Judge upward.
Now we get to an interesting little detail about these non-steroid-burdened records. You may recall that Billy Crystal directed a 2001 movie (big year for baseball) called “61*.” The asterisk in the title was a reference to a scandal that surrounded Maris's record: he had more games that year in which to hit home runs than Ruth did in 1927. So Ruth fans saw Maris's mark as illegitimate, since his performance was enhanced by an extended season.
Judge avoids that critique. You can see that in the inset below. The blue line is Ruth. It ends before the orange (Maris) or black (Judge) lines do because his season extended for fewer games. And you can see how, when Ruth's blue-line season ended, the orange line was underneath his. (Hence the asterisk.) But you can also see that Judge's is above it, by one home run.
There's a lot of baseball history in that chart. Maybe not history in the sense of “the most home runs hit in a season,” but history nonetheless.
And, of course, congratulations to the New York Yankees on ending their 2022 season with fewer wins than the New York Mets.
 
What even are we doing here?: Self-reflection edition
It's been a while since I've elevated a chart that seemed confusing or poorly constructed and broken down how it didn't work. But this week, a chart I made and that I tweeted was a subject of a lot of consternation and complaint, so I figured I would explain how I made, why it didn't work and how the reaction played out.
The chart was part of a story I wrote using federal data to show that fentanyl is generally smuggled across the border at official checkpoints, and often by U.S. citizens. It's an important point, often left out of the political debate. And so, to sell the story to people on Twitter, I lifted one of the charts out of the article and attached it to the tweet.
 
You can probably see the source of the complaints. The gradations between the shades of purple are admittedly subtle!  That's particularly true on Twitter, where the size is generally more compact than in the article.
Here's the version that was in the article. Or, rather — since it's the same graph — here's how the chart looks when extricated from the context of the tweet.
I didn't think much about including it in the tweet since the point was that one particular type of seizure — at U.S.-Mexico border crossings — stood out as so obviously different than the others.
But by taking it out of the article, the chart lost context. Like proximity to the chart below, which shows seizure locations as percentages, not raw totals. Here, the color differences are less confusing since the colored sections are more obvious and the pattern they follow (darker, lighter, darker) more clear.
Now, it is absolutely the case that I could have used five colors instead of two. But I chose two colors for two reasons. First, to contrast the U.S.-Mexico seizures (purple) from the non-border ones (orange). And, second, because The Washington Post has a color palette we use for graphics, and those colors can often look garish when juxtaposed with one another! A chart showing yellow and green and red and blue and orange would be more clear, but also probably fairly hideous.
What could I have done? I could have made some of the lines on the tweeted chart dashed or dotted. I could have taken some of the lines off, since part of the confusion was that people were trying to identify each of the five lines, which wasn't my intent. I could, in other words, have made a Twitter version, recognizing that Twitter users don't often click through to articles (for shame!) and that isolating the chart by itself necessarily means it will be considered differently.
I will also point out that several people suggested that whoever on The Post's graphics team had made the chart should be fired, because Twitter is the most efficient mechanism for being melodramatic that mankind has yet invented. So let me just put it on the record: With rare exception, any bad chart you ever see me share is a bad chart I myself made. So if anyone is to be fired, it should be me.
(Note to my editor: Please do not fire me. I'll do better.)
 
A remarkable historic visualization
A few weeks ago, this newsletter centered on interesting historic data visualizations: a map of CBS radio coverage from the 1930s; a review of the adoption of the zipper.
That prompted reader Rob Easley to send a much older example of visualizing information. Visiting Chicago's Field Museum , he came across the structure below. A nearby display explained what he was looking at: a map of islands in Micronesia — indicated with cowrie shells — connected by sticks that approximated the currents between them.
Photo of a cowrie-shell-and-stick map taken at Chicago's Field Museum. (Courtesy of Rob Easley)
The Field Museum's website has other examples .
At the risk of myself being melodramatic, it's a great example of how the visualization of information is a tool, not merely a hobby. This is a deeply practical use of visualization, one using materials at hand to serve a specific purpose.
If you have other examples of fascinating visualizations, of course, don't hesitate to send them to me. (You can just reply to this email!) Just please understand that, if the presentation is in any way complimentary to the New York Yankees, I will have to pass. I have used up my lifetime supply of graciousness in even talking about Judge's record, much less defending it.
Am I a hero? Yes.
 

Images Powered by Shutterstock