Exercise 6: scatter plot
- Import DataViz_source_files-main/files/GSE150029_rnaseq_log2.csv into an object called rnaseq.
correction
rnaseq <- read_csv("DataViz_source_files-main/files/GSE150029_rnaseq_log2.csv")
- Create a scatter plot that represents sample CTRL on the x axis and sample EZH on the y axis.
correction
ggplot(data=rnaseq, mapping=aes(x=CTRL, y=EZH)) +
geom_point()
3. Color the points according to the gene_biotype
correction
ggplot(data=rnaseq, mapping=aes(x=CTRL, y=EZH, color=gene_biotype)) +
geom_point()
3. Not very readable! Filter and plot only data corresponding to either lincRNA OR miRNA.
correction
ggplot(data=filter(rnaseq, gene_biotype=="lincRNA" | gene_biotype=="miRNA"), mapping=aes(x=CTRL, y=EZH, color=gene_biotype)) +
geom_point()
# using the pipe:
filter(rnaseq, gene_biotype=="lincRNA" | gene_biotype=="miRNA") %>% ggplot(mapping=aes(x=CTRL, y=EZH, color=gene_biotype)) +
geom_point()
4. Now select and only those lincRNAs and miRNAs that are expressed in CTRL at least 1.5 times more than in EZH.
correction
filter(rnaseq, (gene_biotype=="lincRNA" | gene_biotype=="miRNA") & CTRL > 1.5*EZH) %>% ggplot(mapping=aes(x=CTRL, y=EZH, color=gene_biotype)) +
geom_point()
- Add a title to the plot, and make it bold (see theme() section of the course)
correction
filter(rnaseq, gene_biotype=="lincRNA" | gene_biotype=="miRNA" & CTRL > 1.5*EZH) %>% ggplot(mapping=aes(x=CTRL, y=EZH, color=gene_biotype)) +
geom_point() +
ggtitle("lincRNA and miRNA") +
theme(plot.title = element_text(face = "bold"))
NOTE: If you want to label only one (or few) point(s), you can do it the following way:
First, filter the data frame:
SNHG8 <- filter(rnaseq, gene_name=="SNHG8")
Then, add it to geom_text:
filter(rnaseq, gene_biotype=="lincRNA" | gene_biotype=="miRNA" & CTRL > 1.5*EZH) %>% ggplot(mapping=aes(x=CTRL, y=EZH, color=gene_biotype)) +
geom_point() +
ggtitle("lincRNA and miRNA") +
theme(plot.title = element_text(face = "bold")) +
geom_text(data=SNHG8, label="SNHG8", show.legend = FALSE)