Part14 Boxplots: visualize data distribution

A boxplot aims to show data distribution.

import zip

Source

We will import data from a file that contains the same information as geneexp but in a slightly different format (that we will explain in more details in a next section):

geneexp2 <- read_csv("GSE277039/DEG_counts_sample_long.csv")

In our first boxplot, one box corresponds to one sample:

ggplot(geneexp2, aes(x=sample, y=expression)) + 
  geom_boxplot()

Display the individual data points with geom_jitter:

ggplot(geneexp2, aes(x=sample, y=expression)) + 
  geom_boxplot() +
  geom_jitter()

Adjust point size and transparency:

ggplot(geneexp2, aes(x=sample, y=expression)) + 
  geom_boxplot() +
  geom_jitter(size=0.4, alpha=0.5)

Split boxes using a discrete variable - the same way as is done for barplots - by mapping it to fill or color:

ggplot(geneexp2, aes(x=sample, y=expression, fill=DE)) + 
  geom_boxplot()

It is easy to change the box to a violin plot:

ggplot(geneexp2, aes(x=sample, y=expression, fill=DE)) + 
  geom_violin()

Violin plots also aim to visualize data distribution. While boxplots can only show summary statistics / quantiles, violin plots also show the density of each variable.