Implementing soccerbars in ggplot2

A run through of an implementation of soccerbars that works with ggplot2

Posted by Joe O'Reilly on Saturday, July 10, 2021

Implementing soccerbars in ggplot2

What are soccerbars?

Soccerbars are a graphical representation of the results of a series of football fixtures. I first saw them posted on twitter and followed up on their origins - they seem to have been designed by The Social Networks Lab at ETH and are meant to be implemented as a sparkline type plot (i.e. a small inline graphic embedded within text or a table). They consist of a series of parallelograms, rectangles, and dots, representing win/loss, a score draw, or a goalless draw, respectively. The height of the rectangles or parallelograms represents the number of goals scored for and against the team of interest. The image below is an example taken from the ETH Social Networks Lab website, and shows a nicely annotated example plot. image from https://sn.ethz.ch/research/soccerbars.html

The creators have developed an R package to aid in the construction of soccerbars plots, but the framework they have used does not play nicely with ggplot2 and related packages. I decided it would be interesting to develop a more customisable approach to implementing soccerbars in R, one which will use the full breadth of the ggplot2 framework.


A ggplot2 implementation

To implement soccerbars in ggplot2 we need to structure our data appropriately, devise a method to calculate the dimensions and angles of results in which goals were scored, identify no-score draws, and identify games in which one team did not score. For each of these cases we then need to draw the appropriate element at the appropriate fixture.

As an example of use we shall apply the soccerbars visualisation to the results of games at the FIFA Women’s World Cup in which teams that have ever won the competition participated. This dataset is available through tidytuesday

Before doing anything else we will set up the environment by loading the required packages and defining some variables that will be used later to space out the plots correctly across multiple tournaments.

library(data.table) # manipulating data
library(patchwork) # arranging plots
library(ggplot2) # plotting
library(ggtext) # nice text options


# Define a vector detailing the total number of possible games played at each WC
# Used to add empty space on plot where a team did not play
# 6 in 91, 95, 99, 03, 08, 11
# 7 in 15, 19
# 50 total possible apps
all.apps <- data.table(
  year = c(
    rep(1991, 6),
    rep(1995, 6),
    rep(1999, 6),
    rep(2003, 6),
    rep(2007, 6),
    rep(2011, 6),
    rep(2015, 7),
    rep(2019, 7)),
  y.id = c(
    c(1:6),
    c(1:6),
    c(1:6),
    c(1:6),
    c(1:6),
    c(1:6),
    c(1:7),
    c(1:7)),
  id = 1:50
)

# coordinates for the start of each WWC
# used to plot labels and delimiting lines
wc.index <- all.apps[, .(md = min(id)), by = year]

# bar width
width = .35

Step 1 - The Data

For the most simple case in which soccerbars can be implemented we require a the results for a single team - including the index of the game (i.e. match date), goals for, goals against. For a more complex implementation we are likely to require results across an entire league or tournament, this means we will also require the identity of the home and away team in each match. The data is formatted such that each row represents a single fixture, with each row consisting of the columns identified above.

We will use the goals for and goals against columns to work out who won the game (or if it was a draw), and to derive the 4 points of the plotted quadrilateral shape if required. Similarly, we will use these columns to identify if a point needs to be plotted to designate a no-score performance.

We will read in the data and perform some light modification using data.table to retain only the games involving teams that have ever won the Women’s World Cup.

# collect all results in the required format
.collect.results <- function(country, dt){
  zz <- rbind(
    dt[home == country, .(year, yearly_game_id, gf = home_score, ga = away_score)], 
    dt[away == country, .(year, yearly_game_id, gf = away_score, ga = home_score)])
  setkey(zz, year, yearly_game_id)
  zz[, id := c(1:nrow(zz))]
  zz[, nation := country ]
  return(zz)
}

# Read the data
tt_data <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-09/wwc_outcomes.csv")

x <- data.table(tt_data)
x <- x[round != 'Third Place Playoff'] # remove third place playoffs

# split into home/away
x.home <- x[team_num == 1, .(
  year,
  yearly_game_id,
  home = team,
  home_score = score,
  team_num)]
x.away <- x[team_num == 2, .(
  year,
  yearly_game_id,
  away = team,
  away_score = score,
  team_num)]

# one row per game 
fixture.results <- 
  merge(
    x.home,
    x.away, 
    by = c(
      'yearly_game_id',
      'year')
  )[, .(
    year,
    yearly_game_id,
    home, 
    home_score,
    away,
    away_score)]

# only keep games involving ever world champions
fixture.results <- fixture.results[home %in% c('USA', 'GER', 'NOR', 'JPN') |
       away %in% c('USA', 'GER', 'NOR', 'JPN')]

# Get goals for/against for each country, for each game
nation.results <- rbindlist(
  lapply(
    X = c('USA', 'GER', 'NOR', 'JPN'),
    FUN = .collect.results,
    dt = fixture.results))


# add space for unplayed games (i.e. when a team had already been eliminated)
nation.results[, y.id := 1:.N, by = .(year, nation)]
nation.results <- 
  na.omit( # delete rows with NAs induced by the merge
    merge(
      nation.results[, .(year, nation, y.id, gf, ga)],
      all.apps,
      by = c('year','y.id'),
      all.y = TRUE)
)[, .(year, nation, gf, ga, id, y.id)]

nation.results[gf == ga, outcome := 'draw']
nation.results[gf > ga, outcome := 'win']
nation.results[gf < ga, outcome := 'loss']

If we print the data.table object we can see how the cleaned data is structured.

(nation.results)
##      year nation gf ga id y.id outcome
##   1: 1991    USA  3  2  1    1     win
##   2: 1991    GER  4  0  1    1     win
##   3: 1991    NOR  0  4  1    1    loss
##   4: 1991    JPN  0  1  1    1    loss
##   5: 1991    USA  5  0  2    2     win
##  ---                                  
## 156: 2019    USA  2  1 48    5     win
## 157: 2019    GER  1  2 48    5    loss
## 158: 2019    NOR  0  3 48    5    loss
## 159: 2019    USA  2  1 49    6     win
## 160: 2019    USA  2  0 50    7     win

Step 2 - Quadrilaterals

The main visual element of soccerbars is the bar itself. As these bars are angled we need to calculate the location of the 4 points of the bar, accoutring for the fact that the angle of the bar will change depending on the total number of goals scored in a match.

To achieve this we write a function that will return, for a given game, a data.frame (or data.table) that consists of the four x and y points required to draw the bar, and the match index (which is used later to arrange the bars chronologically).

# Build the x, y coordinates to plot the results
.build.poly <- function(
  md, # match index
  gf, # goals for
  ga, # goals conceded
  width = 0.35, # width of the bar
  angle = 14){ # angle 
  
  
  offset = (gf + ga) * sin(angle)
  offset = offset/10
  
  if(ga < gf){ # if a win, (md, md +offset, md + offset + width, md + width)
    dt <- data.table(
      y = c(-ga,
            gf,
            gf,
            -ga),
      x = c(md,
            md + offset,
            md + offset + width,
            md + width),
      md = rep(as.character(md), 4))
  } 
  
  
  if(ga > gf){ # if a loss, (md, md - offset, md - offset + width, md + width)
    dt <- data.table(
      y = c(-ga,
            gf,
            gf,
            -ga),
      x = c(md + offset,
            md ,
            md + width,
            md + offset + width),
      md = rep(as.character(md), 4))
  }
  
  if(ga == gf){ # if a draw, no offset
    offset <- 0
    dt <- data.table(
      y = c(-ga,
            gf,
            gf,
            -ga),
      x = c(md,
            md,
            md + width,
            md + width),
      md = rep(as.character(md), 4))
  }
  
  return(dt)
}

This function will then be called to return the data.table which will eventually be passed to ggplot() to actually draw each bar element.

# Convert the results to polygons
dt.polygons <- nation.results[, .build.poly(md = id, gf = gf, ga = ga, width = width), by = .(id, nation)]
dt.polygons <- merge(
  dt.polygons,
  nation.results[, .(md = as.character(id), nation, outcome)],
  by = c('nation', 'md'))

Printing a subset of the contents of the dt.polygons object demonstrates the format of the data and how we have constructed the points at which the quadrilateral will be drawn for both the x and y axes.

dt.polygons[md == 1 & nation == 'USA']
##    nation md id  y        x outcome
## 1:    USA  1  1 -2 1.000000     win
## 2:    USA  1  1  3 1.495304     win
## 3:    USA  1  1  3 1.845304     win
## 4:    USA  1  1 -2 1.350000     win

Step 3 - Dots

Deriving the required information for plotting the dots (for no-score draws and clean-sheets) is much more straightforward. We will simply place dots where required by picking x-axis values where a team didn’t score and offset on the y-axis as appropriate. The following block of code takes our results and does exactly this process. As with polygons, the goal here is to return a data.table containing everything that will be needed to plot all the required dots.

# Convert no-goal games to points
# first get unbalanced goalless teams
dt.points <- 
  nation.results[gf == 0 | ga == 0,
     .(nation, gf, ga, x = id + (width/2), y = 1)][
       ga == 0 & gf != 0, y := -y][, score := TRUE]
# then deal with goalless draws
dt.points <- rbind(
  dt.points,
  dt.points[gf == 0 & ga == 0, .(nation, gf, ga, x, y= -y, score = TRUE)], 
  dt.points[gf == 0 & ga == 0, .(nation, gf, ga, x, y= 0, score = FALSE)])

The structure of the dt.points object is similar to the other plot data objects that have been created so far.

head(dt.points, 10)
##     nation gf ga     x  y score
##  1:    GER  4  0 1.175 -1  TRUE
##  2:    NOR  0  4 1.175  1  TRUE
##  3:    JPN  0  1 1.175  1  TRUE
##  4:    USA  5  0 2.175 -1  TRUE
##  5:    GER  3  0 2.175 -1  TRUE
##  6:    NOR  4  0 2.175 -1  TRUE
##  7:    JPN  0  8 2.175  1  TRUE
##  8:    USA  3  0 3.175 -1  TRUE
##  9:    GER  1  0 3.175 -1  TRUE
## 10:    JPN  0  3 3.175  1  TRUE

Step 4 - Bringing it all together and adding design elements

As we are plotting results for multiple teams, we can chose to apply a multiple panel (i.e. facet) approach or a multiple plot approach to present all of the data concurrently. The facet approach takes a single line of code to achieve, but it severely limits the customisation that can be performed (at least without requiring lots of fiddling with each panel). An easier approach when we want to colour the bars by team and by result is to join multiple plots with the patchwork package. As we are looping over multiple teams it makes sense to wrap our ggplot() code into a function that can be applied to any team.

This function takes a data.table containing the polygon plotting information, the dot plotting information, the name of the team, and the wc.index object we defined earlier (which is used to space out the individual results correctly across tournaments). Each of the data.table objects that we have created are passed to ggplot() with inherit.aes=FALSE, which allows us to control the aesthetic style for each of the new geom_*() function calls independent of one another.

# Place plot construction in a function, use patchwork to arrange multiple 
# ggplot objects created with this function
build.plot <- function(team = 'USA', dt.polygons, dt.points, wc.index){
  p <- ggplot(dt.polygons[nation == team]) + 
    # add timeline 
    geom_hline(yintercept = 0, size = .5) +
    # add wc delimiter lines
    geom_vline(
      xintercept = wc.index$md,
      linetype = 3,
      color = 'grey30',
      size = .25) +
    # Draw bars
    geom_polygon(
      aes(
        fill = outcome,
        group = md,
        x = x,
        y = y),
      color = 'black',
      size = .5) +
    # draw 'no goal' points
    geom_point(
      data = dt.points[nation == team],
      inherit.aes = FALSE,
      aes(x = x, y = y),
      fill = 'black',
      shape = 21,
      size = 1,
      stroke = 1) +
    # draw 'no score draw' points
    geom_point(
      data = dt.points[score == FALSE & nation == team],
      inherit.aes = FALSE,
      aes(x = x, y = y),
      fill = 'white',
      shape = 21,
      size = 2,
      stroke = 1) +
    # Add a small buffer on the right of the axis
    scale_x_continuous(
      expand = c(0, 0),
      limits = as.numeric(
        dt.polygons[, c(min(md), max(id) + 1)])) + 
    # Scale height by team
    scale_y_continuous(
      expand = c(0, 0),
      limits = as.numeric(
        dt.polygons[nation == team, c(min(y), max(y))])) + 
    coord_equal(clip = 'off') + 
    # Hide the aesthetics legends
    guides(color = FALSE,
           fill = FALSE) +
    theme_void() +
    theme(panel.spacing.y = unit(0, "lines"),
          plot.margin = margin(10, 10, 10, 10),
          plot.background = element_rect(fill = '#f5f9ff', color = NA),
          panel.border = element_blank(),
          plot.title = element_text(family = 'Helvetica Neue'),
          plot.caption = element_text(
            family = 'Helvetica Neue',
            color = 'grey30',
            size = 7))
  
  return(p)
}

Ideally we want to add some nice aesthetic styling to our plots, and in the case of football it is logical that we would want to colour the bars by the team colours. To achieve this we need an object that stores three colours for each team, these will be used to colour a win, loss, or draw. We will then later exploit the fact that the build.plot() function we defined above returns a ggplot2 object to combine the returned plot with the correct object passed through ggplot::scale_*_manual().

# Define some national team colours, based on the team badge
pal.usa <- c(
  'draw' = '#1f2742',
  'win' = 'white',
  'loss' = '#bb2533'
)

pal.jpn <- c(
  'win' = 'black',
  'draw' = 'white',
  'loss' = '#e30016'
)

pal.ger <- c(
  'win' = '#00a768',
  'draw' = '#7a7878',
  'loss' = '#d80f18'
)

pal.nor <- c(
  'win' = '#00296b',
  'draw' = 'white',
  'loss' = '#d3162f'
)

The final plot will be built of many elements combined together using the patchwork package. The first element to plot is the soccerbars representation of The USWNT. For this plot we shall also add some text labels alongside the dotted lines marking the start of each tournament. To achieve this we will define a data.table containing the year and host nation of each tournament. This data is then passed to ggtext::geom_richtext() to be styled and placed on the USWNT plot only.

# Text labels marking each world cup year - add to the top patchwork plot
dt.wc.labels <- data.table(
  x = c(wc.index$md),
  # place at lowest point on the USA plot
  y = rep( 
    dt.polygons[nation == 'USA', min(y)], 8),
  # Build label of location + year
  label = paste0(
    c('China ', 'Sweden ', 'USA ', 'USA ',
      'China ', 'Germany ', 'Canada ', 'France ' ),
    wc.index$year))


# build the individual panels
# Construct the results for each team of interest
p.usa <- build.plot(
  'USA',
  dt.polygons,
  dt.points,
  wc.index) + 
  scale_fill_manual(values = pal.usa) + 
  geom_richtext(data = dt.wc.labels, # Add wc year annotations
                mapping = aes(x = x, y = y, label = label),
                color = 'grey10',
                family = 'Helvetica Neue',
                size = 1.75,
                fill = NA,
                label.color = NA,
                hjust = 0,
                vjust = 0.5) +
  labs(title = 'USA')
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

Next, we build the plots for all other nations. As we are not adding text labels for the other nations this is simply a matter of calling the build.plot() function with the relevant data supplied as the function parameters. Here we also use + to add team specific elements to each plot (title and colour scheme), exploiting the fact that build.plot() returns a ggplot2 object.

p.jpn <- build.plot('JPN',
                    dt.polygons,
                    dt.points,
                    wc.index) + 
  scale_fill_manual(values = pal.jpn) +
  labs(title = 'Japan')
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
p.ger <- build.plot('GER', 
                    dt.polygons,
                    dt.points,
                    wc.index) + 
  scale_fill_manual(values = pal.ger) +
  labs(title = 'Germany')
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
p.nor <- build.plot('NOR',
                    dt.polygons,
                    dt.points,
                    wc.index) + 
  scale_fill_manual(values = pal.nor) +
  labs(title = 'Norway')
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

As an example of a single plot element, here is the soccerbars visualisation for Germany that is stored in p.ger.

Finally, we build the total plot by combining the multiple elements together using the patchwork syntax.

# arrange all plots and save
p.total <- (p.usa / p.ger / p.nor / p.jpn)  + 
  # Add a caption, exploits plot_annotation() to color the background for coord_equal
  plot_annotation(
    caption = "Visualisation by Joe O'Reilly (github.com/josephedwardoreilly)\nInspired by https://sn.ethz.ch/research/soccerbars.html",
    theme = theme(
      plot.margin = margin(5, 0, 10, 0),
      plot.caption = element_text(
        family = 'Helvetica Neue',
        size = 8,
        colour = 'grey10'),
      plot.background = element_rect(
        fill = '#f5f9ff',
        color = NA)))

As we are using patchwork to build the plot, we can also glue other stylistic elements onto the plot. Below is an example in which I have included a legend and a title for the entire plot using this approach. The code for this more extensive plot is available here.