What genres of books have I read so far in 2025 and how many pages have I read? A short little data visualization exercise. Credit to the Libby app and Goodreads for helping me keep track of all the books I read, and to this example from the R graph gallery for inspiring my genre bump plot.
library(readxl)library(tidyverse)library(ggbump)
Warning: package 'ggbump' was built under R version 4.4.3
library(ggpattern)
Warning: package 'ggpattern' was built under R version 4.4.3
library(paletteer)library(magick)
Warning: package 'magick' was built under R version 4.4.3
library(ggimage)
Warning: package 'ggimage' was built under R version 4.4.3
library(extrafont)
font_import(pattern ="Lucida Sans")
Importing fonts may take a few minutes, depending on the number of fonts and the speed of the system.
Continue? [y/n]
Exiting.
loadfonts(device ="win")
Lucida Sans already registered with windowsFonts().
bookdata <-read_xlsx("bookdata.xlsx")# Data manipulation to draw out books w/ multiple genres# and rank genre by most read per monthbookdata_bump <- bookdata |>separate_rows(Genre, sep =", ") |>mutate(.by = Month,genre_rank =match( Genre, names(sort(table(Genre), decreasing =TRUE)) ) )# histogram of pages read each monthbookdata_hist <- bookdata |>select(Month, Pages) |>summarise(.by = Month, sum_pages =sum(Pages)) |>mutate(Month_str =case_when( Month ==1~"Jan.", Month ==2~"Feb.", Month ==3~"Mar.", Month ==4~"Apr.", Month ==5~"May", Month ==6~"June", Month ==7~"July"),image_file ="imgs/bookimage6.png" )# Top genres by rating dataframebookdata_ratings <- bookdata |>separate_rows(Genre, sep =", ") |>mutate(.by = Genre,avg_rating =mean(Rating),image_file ="imgs/bookimage6.png" ) |>mutate(.by =c(Month, Genre),sum_pages =sum(Pages) ) |>select(Month, Genre, avg_rating, sum_pages, image_file) |>distinct() |>arrange(desc(avg_rating))# Get individual dataframes for top 5 rated genresratings_1 <- bookdata_ratings |>filter(avg_rating ==max(avg_rating))ratings_2 <- bookdata_ratings |>filter(avg_rating ==sort(unique(avg_rating), TRUE)[2])ratings_3 <- bookdata_ratings |>filter(avg_rating ==sort(unique(avg_rating), TRUE)[3])ratings_4 <- bookdata_ratings |>filter(avg_rating ==sort(unique(avg_rating), TRUE)[4])ratings_5 <- bookdata_ratings |>filter(avg_rating ==sort(unique(avg_rating), TRUE)[5])fave_genres <-c("Fantasy", "Science fiction", "Mystery")other_genres <-setdiff(unique(bookdata_bump$Genre), fave_genres)
Bump Plot
In this plot, we use a bump plot to explore the ranking of genres read every month. Ranking is defined as the number of times I read a book of that genre in a month. Books can have multiple genres, in which case their genres are counted separately.
book_bump_plot <- bookdata_bump |>ggplot(aes(x = Month, y = genre_rank, group = Genre)) +# Add light lines for non-favorite genresgeom_bump(color ="#c9c9c9", lwd =0.6,) +geom_bump(aes(color = Genre), lwd =0.8, data =~. |>filter(Genre %in% fave_genres)) +geom_point(color ="#c9c9c9",data =~. |>filter(Genre %in% other_genres)) +geom_point(color ="white", size =3,data =~. |>filter(Genre %in% fave_genres)) +geom_point(aes(color = Genre), size =4, shape =21, stroke =1,data =~. |>filter(Genre %in% fave_genres)) +# Add custom swords symbol for fantasygeom_point(shape ="\u2694", size =2,data =~. |>filter(Genre =="Fantasy"),position =position_nudge(x =0.025)) +# Add custom question mark symbol for mysterygeom_point(shape ="🔎", size =2,data =~. |>filter(Genre =="Mystery"),position =position_nudge(x =0.025)) +# Add custom symbol for science fictiongeom_point(shape ="👾", size =2,data =~. |>filter(Genre =="Science fiction"),position =position_nudge(x =0.025)) +# Add labels for genresgeom_text(aes(label = Genre, color = Genre), x =7.15, size =3.5, hjust =0, fontface ="bold", family ="Lucida Sans",data =~. |>filter(Genre %in% fave_genres & Month ==7)) +geom_text(aes(label = Genre), color ="#838383", x =7.15, size =3.5, hjust =0, family ="Lucida Sans",data =~. |>filter(Genre %in% other_genres & Month ==7)) +# Scale x and y axes for readability and stylescale_x_continuous(breaks =seq(1, 7, 1), minor_breaks =seq(1, 7, 1),labels =c("Jan.", "Feb.", "Mar.", "Apr.","May", "June", "July"),limits =c(0.95, 8.1)) +scale_y_reverse(breaks =c(1, 3, 6, 9, 12, 15)) +# Manually set color and font themesscale_color_manual(values =c("#970000", "#5998d6", "#111111")) +labs(x =NULL, y ="Most-Read Genre", title ="What Genres Did I Read Each Month?") +theme_minimal() +theme(legend.position ="none",text =element_text(family ="Lucida Sans"),plot.title =element_text(color ="#970000",size =21),plot.title.position ="plot",axis.text =element_text(size =11))book_bump_plot
Histogram of Pages Read
Some months I read a lot, and other months I’m very busy and don’t manage to make it through my to-read pile. What months was I able to squeeze in the most pages?
book_hist_plot <- bookdata_hist |>ggplot(aes(x = Month, y = sum_pages)) +geom_col_pattern(aes(pattern_filename = image_file),pattern ='image',pattern_type ='tile',pattern_filter ='box',pattern_scale =-1 ) +scale_pattern_filename_discrete(choices = bookdata_hist$image_file) +scale_x_continuous(breaks =seq(1, 7, 1), labels =c("Jan.", "Feb.", "Mar.", "Apr.","May", "June", "July")) +labs(x =NULL, y =NULL,title ="How Many Pages Did I Read Each Month?") +theme_minimal() +theme(legend.position ="none",text =element_text(family ="Lucida Sans"),axis.text =element_text(size =11),axis.text.x =element_text(size =13),plot.title =element_text(color ="#970000",size =21),plot.title.position ="plot",panel.grid.major.x =element_blank(),panel.grid.minor =element_blank())book_hist_plot
Top 5 Genres by Rating
Every year, Spotify Wrapped comes around and I see all the beautiful data that Spotify presents to me and my friends. In an attempt to emulate that presentation, I’ve compiled my top 5 genres by rating, with the book cover of my favorite book in each genre this year and a histogram of pages read by month. Ratings are out of 5.
# Helper function to make barcharts for top 5 genre ratings# since each genre will have the same style of barchartmake_custom_barchart <-function( df_ratings) { p <- df_ratings |>ggplot(aes(x = Month, y = sum_pages)) +geom_col_pattern(aes(pattern_filename = image_file),pattern ='image',pattern_type ='tile',pattern_filter ='box',pattern_scale =-1 ) +scale_pattern_filename_discrete(choices = df_ratings$image_file) +scale_x_continuous(breaks =seq(1, 7, 1), labels =c("Jan.", "Feb.", "Mar.", "Apr.","May", "June", "July")) +labs(x =NULL, y ="No. Pages Read",title = glue::glue("{df_ratings$Genre[1]} (Average Rating: {round(df_ratings$avg_rating[1], 2)})")) +theme_minimal() +theme(legend.position ="none",text =element_text(family ="Lucida Sans"),axis.text =element_text(size =11),axis.text.x =element_text(size =13),axis.title.y =element_text(size =14),plot.title =element_text(color ="#970000",size =25),panel.grid.major.x =element_blank(),panel.grid.minor =element_blank(),plot.margin =unit(c(2, 1, 2, 1), 'cm') ) p}
Now that we’ve defined the helper function for creating the pages read per month histogram, we can actually make the visualization.