8.9 Boxplots

A boxplot is used to visualize the distribution of data.

import zip

Source

We will import data from a file that contains the same information as geneexp but in a slightly different format:

geneexp2 <- read_csv("DataViz_source_files-main/files/expression_20genes_long.csv")

In our first boxplot, one box corresponds to one sample:

ggplot(geneexp2, aes(x=sample, y=expression)) + 
  geom_boxplot()

We can split boxes by DE, the same way we did for barplots, by mapping fill or color to the variable:

ggplot(geneexp2, aes(x=sample, y=expression, fill=DE)) + 
  geom_boxplot()

If you prefer a violin plot, it is easy:

ggplot(geneexp2, aes(x=sample, y=expression, fill=DE)) + 
  geom_violin()

Violin plots also aim to visualize data distribution. While boxplots can only show summary statistics / quantiles, violin plots also show the density of each variable.