Treehouse of Horror

The Halloween Specials, Visualized

Told ya I wasn’t done with The Simpsons just yet. After comparing them to other animated shows, there is still more to tell. Here’s a quick look at the Treehouse of Horror episodes.


Using an R script file, I can reference the rvest scraper I made earlier without copy/pasting.



plot_caption <- "\nSource: IMDb"

After getting the data, we can do some plotting.

simpsons <- grab_imdb_ratings("tt0096697", c(1:31))

Treehouse of Horror Episodes Do Well

Here we use str_detect to just grab the relevant episodes. With some grouping, can get the average ratings for non-Halloween-special episodes to compare against the October favorites. Turns out the Treehouse of Horror episodes tend to have higher ratings than the average rating for other episodes in the season. They certainly are some of the most memorable shows.

# ratings of ToH over time
simpsons %>% 
  mutate(toh_episode = str_detect(title, "Treehouse")) %>% 
  group_by(season, toh_episode) %>% 
  mutate(avg_rating = mean(rating), 
         avg_votes = mean(votes)) %>% 
  select(show:season, toh_episode, avg_rating, avg_votes) %>% 
  distinct() %>% 
  ggplot(aes(season, avg_rating, color = toh_episode)) +
    geom_smooth(aes(group = toh_episode), se=FALSE, color = "#eeeeee") +
    geom_line() +
    scale_x_continuous(breaks = seq(1,31,2)) +
    scale_color_manual(values = c("#009EDC", "#F14E28")) +
      title = "Television! Teacher, Mother, Secret Lover.",
      subtitle = "IMDb Ratings for Regular and Treehouse of Horror Simpsons Episodes",
      color = "ToH Episode",
      x = "Season",
      y = "Average Episode Rating",
      caption = plot_caption


  • There was no Treehouse of Horror in the first season so the red line has no value there
  • Most ToH episodes are rated higher than the average rating for the other episodes in the season. It doesn’t start dipping till season eighteen.
  • That peak is Treehouse of Horror V, the third-highest-rated episode in the dataset.

Do They Always Fall Near Halloween?

I wrote a little earlier on that these episodes are “October favorites”. However, that often is not the case. Many Halloween specials aired at the start of November, after Halloween. Given that the show airs on Sundays and Halloween doesn’t always fall on that day, there can be some wiggle room. For the first two decades, the Halloween special always fell within a week of the actual holiday. However, in later years the special is a little more variable, sometimes airing at the start of the month, a full twenty-five days early.

# how close to Halloween?
simpsons %>% 
  filter(str_detect(title, "Treehouse")) %>% 
  # convert dates to just month/date
  mutate(month_day = as.Date(paste("2000-", format(air_date, "%m-%d"))),
         halloween_diff = as.numeric(abs(month_day - as.Date("2000-10-31")))) %>% 
  ggplot(aes(season, month_day, color = halloween_diff)) +
    geom_hline(yintercept = as.Date("2000-10-31"), color = "red", alpha = 0.4) +
    geom_hline(yintercept = as.Date("2000-10-24"), color = "red", alpha = 0.4, linetype = "dashed") +
    geom_hline(yintercept = as.Date("2000-11-07"), color = "red", alpha = 0.4, linetype = "dashed") +
    geom_point() +
      title = "They're Showing a Halloween Episode...In November!",
      subtitle = "Treehouse of Horror Episodes by Date Aired",
      x = "Season",
      y = "Episode Air Date",
      color = "Days ± Oct 31st",
      caption = plot_caption


  • Factoring out the code from the work with rvest into a separate R script made it easier to work with. Calling it with source was a cinch.
  • Getting ggplot to just plot the month and day as a pair was a little finicky. Did some date conversion and such to get it to work.
  • Tried to use geom_ribbon for one week gap around Oct 31st, but ran into challenges. Was easier to just make two lines.

Image Credit

pumpkin by Zach Bogart from the Noun Project

Zach Bogart
Zach Bogart
Data Explorer

Science, Design, & Data. I’ll know it when I see it.