8.7 Exercise 2

  1. Import file DataViz_source_files-main/files/gencode.v44.annotation.csv in an object called gtf.
correction
gtf <- read_csv("DataViz_source_files-main/files/gencode.v44.annotation.csv")

This is a small subset of the gencode v44 human gene annotation, created the following way:

  • Selection of protein coding genes, long non-coding genes, miRNAs, snRNAs and snoRNAs.
  • Selection of chromosomes 1 to 10 only.
  • Creation of a random subset of 1000 genes.
  • Convertion to a friendly csv format.


2. Create a simple barplot representing the count of genes per chromosome:

correction
ggplot(data=gtf, mapping=aes(x=chr)) + 
  geom_bar()


3. Keep the chromosome represented on the x axis, and split the barplot per gene type.

TIP: remember how we set color= in mapping=aes() function in the scatter plot section? Give it a try here!

correction
ggplot(data=gtf, mapping=aes(x=chr, color=gene_type)) + 
  geom_bar()


4. Change color= with fill= in aes(). What changes?

correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar()


5. Add a title to the graph:

correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar() +
  ggtitle(label = "Number of genes per chromosome, split by gene type")


6. Change the default theme:

correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar() +
  ggtitle(label = "Number of genes per chromosome, split by gene type") +
  theme_bw()


7. Save the graph in PNG format in the workshop’s directory.

correction
# save plot in an object
gtfbars <- ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar() +
  ggtitle(label = "Number of genes per chromosome, split by gene type") +
  theme_bw()

# save as PNG file
ggsave(filename="gtfbarplot.png", plot=gtfbars, 
       device="png")