11.3 Exercise 7: boxplot

  1. Read in DataViz_source_files-main/files/GSE150029_rnaseq_log2_long.csv into a new object called rnaseq2.
correction
rnaseq2 <- read_csv("DataViz_source_files-main/files/GSE150029_rnaseq_log2_long.csv")


  1. Create a boxplot that will represent the samples on the x axis, and their expression on the y axis.
correction
ggplot(data=rnaseq2, mapping=aes(x=sample, y=log2_counts)) + 
  geom_boxplot()


  1. Split the boxes per gene_biotype.
correction
ggplot(data=rnaseq2, mapping=aes(x=sample, y=log2_counts, fill=gene_biotype)) + 
  geom_boxplot()


  1. Keep only protein_coding and lincRNA biotypes (you can save the filtered data into a new object) and re-do the same plot as in 3.
correction
rnaseq2_filtered <- filter(rnaseq2, gene_biotype=="protein_coding" | gene_biotype=="lincRNA")

ggplot(data=rnaseq2_filtered, mapping=aes(x=sample, y=log2_counts, fill=gene_biotype)) + 
  geom_boxplot()


  1. Add a geom_violin() layer. Set alpha=0.3 in geom_violin. What is the alpha parameter?
correction
ggplot(data=rnaseq2_filtered, mapping=aes(x=sample, y=log2_counts, fill=gene_biotype)) + 
  geom_boxplot() +
  geom_violin(alpha=0.3)

# if boxplot and violin plots are misaligned, you can play with the position parameter in geom_violin, such as:
# geom_violin(position=position_dodge(0.7))


  1. Look at the help page of geom_boxplot() and change the following parameters:
  • Set outlier color to red
  • Set outlier shape as triangles
correction
ggplot(data=rnaseq2_filtered, mapping=aes(x=sample, y=log2_counts, fill=gene_biotype)) + 
  geom_boxplot(outlier.colour = "red", outlier.shape = "triangle")