11.3 Exercise 7: boxplot

  1. Convert DataViz_source_files-main/files/GSE150029_rnaseq_log2.csv (rnaseq object) to a long format, and save it as a new object called rnaseq2.

Note that it exists already as a separate object, if you prefer to read this in directly: DataViz_source_files-main/files/GSE150029_rnaseq_log2_long.csv.

correction
rnaseq2 <- rnaseq %>% pivot_longer(cols=c(CTRL, EZH), names_to = "sample", values_to = "log2_counts")


  1. Create a boxplot that will represent the samples on the x axis, and their expression on the y axis.
correction
ggplot(data=rnaseq2, mapping=aes(x=sample, y=log2_counts)) + 
  geom_boxplot()


  1. Split the boxes per gene_biotype.
correction
ggplot(data=rnaseq2, mapping=aes(x=sample, y=log2_counts, fill=gene_biotype)) + 
  geom_boxplot()


  1. Select only protein_coding and lincRNA and split again the boxes per gene_biotype.
correction
filter(rnaseq2, gene_biotype=="protein_coding" | gene_biotype=="lincRNA") %>% ggplot(mapping=aes(x=sample, y=log2_counts, fill=gene_biotype)) + 
  geom_boxplot()


  1. Add a geom_violin() layer. Set alpha=0.3 in geom_violin. What is the alpha parameter?
correction
filter(rnaseq2, gene_biotype=="protein_coding" | gene_biotype=="lincRNA") %>% ggplot(mapping=aes(x=sample, y=log2_counts, fill=gene_biotype)) + 
  geom_boxplot() +
  geom_violin(alpha=0.3)

# if boxplot and violin plots are misaligned, you can play with the position parameter in geom_violin, such as:
# geom_violin(position=position_dodge(0.7))


  1. Look at the help page of geom_boxplot() and change the following parameters:
  • Set outlier color to red
  • Set outlier shape as triangles
correction
filter(rnaseq2, gene_biotype=="protein_coding" | gene_biotype=="lincRNA") %>% ggplot(mapping=aes(x=sample, y=log2_counts, fill=gene_biotype)) + 
  geom_boxplot(outlier.colour = "red", outlier.shape = "triangle")