8.10 Boxplots

A boxplot aims to show data distribution.

import zip

Source

We will import data from a file that contains the same information as geneexp but in a slightly different format (that we will explain in more details in a next section):

geneexp2 <- read_csv("DataViz_source_files-main/files/expression_20genes_long.csv")

In our first boxplot, one box corresponds to one sample:

ggplot(geneexp2, aes(x=sample, y=expression)) + 
  geom_boxplot()

We can split boxes by DE, the same way as is done for barplots, by mapping fill or color to the variable:

ggplot(geneexp2, aes(x=sample, y=expression, fill=DE)) + 
  geom_boxplot()

It is easy to change the box to a violin plot:

ggplot(geneexp2, aes(x=sample, y=expression, fill=DE)) + 
  geom_violin()

Violin plots also aim to visualize data distribution. While boxplots can only show summary statistics / quantiles, violin plots also show the density of each variable.