8.7 Exercise 2

  1. Import file DataViz_source_files-main/files/gencode.v44.annotation.csv in R, into an object called gtf.
correction
gtf <- read_csv("DataViz_source_files-main/files/gencode.v44.annotation.csv")

This is a small subset of the gencode v44 human gene annotation:

  • Only protein coding, long non-coding, miRNAs, snRNAs and snoRNAs
  • Limited to chromosomes 1 to 10
  • Random subset of 1000 genes
  • Converted to a friendly csv format.


2. Create a simple barplot representing the count of genes per chromosome:

correction
ggplot(data=gtf, mapping=aes(x=chr)) + 
  geom_bar()


3. Keeping the chromosome on the x axis, split the barplot per gene type.

TIP: remember how we set color= in mapping=aes() function in the scatter plot section? Give it a try here!

correction
ggplot(data=gtf, mapping=aes(x=chr, color=gene_type)) + 
  geom_bar()


4. Now try with fill instead of color in aes():

correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar()


5. Add a title to the graph:

correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar() +
  ggtitle(label = "Number of genes per chromosome, split by gene type")


6. Change the default theme:

correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar() +
  ggtitle(label = "Number of genes per chromosome, split by gene type") +
  theme_bw()


7. Save the graph in PNG format in the course’s directory.

correction
# save plot in an object
gtfbars <- ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar() +
  ggtitle(label = "Number of genes per chromosome, split by gene type") +
  theme_bw()

# save as PNG file
ggsave(filename="gtfbarplot.png", plot=gtfbars, 
       device="png")