Exercise 2
- Import file DataViz_source_files-main/files/gencode.v44.annotation.csv in R, into an object called gtf.
correction
gtf <- read_csv("DataViz_source_files-main/files/gencode.v44.annotation.csv")
This is a small subset of the gencode v44 human gene annotation:
- Only protein coding, long non-coding, miRNAs, snRNAs and snoRNAs
- Limited to chromosomes 1 to 10
- Random subset of 1000 genes
- Converted to a friendly csv format.
2. Create a simple barplot representing the count of genes per chromosome:
correction
ggplot(data=gtf, mapping=aes(x=chr)) +
geom_bar()
3. Keeping the chromosome on the x axis, split the barplot per gene type.
TIP: remember how we set color= in mapping=aes() function in the scatter plot section? Give it a try here!
correction
ggplot(data=gtf, mapping=aes(x=chr, color=gene_type)) +
geom_bar()
4. Now try with fill instead of color in aes():
correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) +
geom_bar()
5. Add a title to the graph:
correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) +
geom_bar() +
ggtitle(label = "Number of genes per chromosome, split by gene type")
6. Change the default theme:
correction
ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) +
geom_bar() +
ggtitle(label = "Number of genes per chromosome, split by gene type") +
theme_bw()
7. Save the graph in PNG format in the course’s directory.
correction
# save plot in an object
gtfbars <- ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) +
geom_bar() +
ggtitle(label = "Number of genes per chromosome, split by gene type") +
theme_bw()
# save as PNG file
ggsave(filename="gtfbarplot.png", plot=gtfbars,
device="png")