8.8 Barplots: bars position

We can also play with the position of the bars. By default, position is stack, i.e. categories are stacked on top of each other along the bar.

Position fill scales data so the top is always 1, i.e. it shows proportions, instead of the absolute values:

ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar(position="fill")

Position dodge represents each category (here, gene types) side-by-side:

ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar(position="dodge")

More advanced (as reference, or if someone asks): how to reorder x-axis labels:

Factors are a data type in R: they are used to represent categorical data. Using factors requires a bit more understanding of R works/thinks, but here is an application using ordered factors/categories:

ggplot(data=gtf, mapping=aes(x=factor(chr, levels=c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10"), ordered=TRUE), fill=gene_type)) + 
  geom_bar(position="dodge") +
  xlab("chromosome")

8.8.1 stat=“identity” parameter

stat represents a statistical transformation of the data. It typically aims to summarize the data.

geom_bar() provides different options for stat:

  • count (default): counts the number of occurrences of each value / category in x. It does not expect an input in y.
  • identity: uses the data as is (i.e. no transformation is applied) and skips the aggregation. Values used for the bars (categories) are provided by the user in x. Height of the bars are provided in y.

Let’s import data from file: DataViz_source_files-main/files/stats_continents_barcelona_2013-2023_long.csv in an object called statsbcn.

The data contains the number of foreign residents in Barcelona from 2013 to 2023.

statsbcn <- read_csv("DataViz_source_files-main/files/stats_continents_barcelona_2013-2023_long.csv")

How many rows and how many columns does the data contain?

In the barplots we created so far, R takes categories in the columns specified in x= and counts the number of occurrences.

If we now set stat=“identity” in geom_bar(), R uses the sum of the variable specified in y=, grouped by the x variable.

In the following example, we are plotting the sum of foreign residents in Barcelona (Population provided in y) per year (Year provided in x):

ggplot(statsbcn, aes(x=Year, y=Population)) + 
  geom_bar(stat="identity")

We can map, for example, fill to Continent:

ggplot(statsbcn, aes(x=Year, y=Population, fill=Continent)) + 
  geom_bar(stat="identity")

We can further play with the position, as previously done.

  • Position fill :
ggplot(statsbcn, aes(x=Year, y=Population, fill=Continent)) + 
  geom_bar(stat="identity", position="fill")

  • Position dodge :
ggplot(statsbcn, aes(x=Year, y=Population, fill=Continent)) + 
  geom_bar(stat="identity", position="dodge")

We can control the width of bars (hence, the spacing between 2 bars) using the width parameter of geom_bar():

ggplot(statsbcn, aes(x=Year, y=Population, fill=Continent)) + 
  geom_bar(stat="identity", position="dodge", width = 0.8)

More advanced (as reference, or if someone asks): display all labels:

Convert “Year” column as character, instead of numbers:

# convert the x-axis from a continuous to a discrete variable (as.character)
ggplot(statsbcn, aes(x=as.character(Year), y=Population, fill=Continent)) + 
  geom_bar(stat="identity", position="dodge")