Survival Data – Sankey Diagram

We have learned survival analysis in the last few posts, using a dataset involving 42 data points from an efficacy for an experimental drug. The data set was in the following format.

groupgenderrelapse
Treatment FemaleTRUE
Treatment FemaleTRUE
Treatment MaleTRUE
Treatment FemaleFALSE
Treatment FemaleTRUE
ControlMaleTRUE
ControlFemaleTRUE
ControlFemaleTRUE
ControlMaleTRUE
ControlFemaleTRUE

Sankey diagram

A Sankey diagram is a visualisation technique for showing the flow of energy, material, or, in this case, events. The simplest example is visualising the flow of how the treatment and control groups responded to the illness’s relapse.

It is noticeable that all the participants in the control group had relapses of the disease, whereas it was mixed in the treatment group.

The plot was created by executing the following R code:

library(ggsankey)
library(tidyverse)
df1 <- ill_data %>% make_long(group1, relapse)

san_plot <- ggplot(df1, aes(x = x
                            , next_x = next_x
                            , node = node
                            , next_node = next_node
                            , fill = factor(node)
                            , label = node))
san_plot <- san_plot + geom_sankey(flow.alpha = 0.5
                                   , node.color = "black"
                                   , show.legend = FALSE)
san_plot <- san_plot + geom_sankey_label(size = 3, color = "black", fill = "white", hjust = 0.0)
san_plot <- san_plot + theme_bw()

san_plot

Note that the package ‘ggsankey’ may not be available from your usual repository, CRAN. You may be required to run the following two lines to get it.

install.packages("remotes")
remotes::install_github("davidsjoberg/ggsankey")

Let’s add another node to the Sankey, the gender.

df1 <- ill_data %>% make_long(group1, relapse, gender)

san_plot <- ggplot(df1, aes(x = x
                            , next_x = next_x
                            , node = node
                            , next_node = next_node
                            , fill = factor(node)
                            , label = node))
san_plot <- san_plot + geom_sankey(flow.alpha = 0.5
                                   , node.color = "black"
                                   , show.legend = FALSE)
san_plot <- san_plot + geom_sankey_label(size = 3, color = "black", fill = "white", hjust = 0.0)
san_plot <- san_plot + theme_bw()

san_plot

Further resources

World Energy Flow 2019: IEA