8.8 Barplots: bars position

We can also play with the position of the bars. By default, position is stack, i.e. categories are stacked on top of each other along the bar.

Position fill will show the proportions, instead of the absolute values, of each category:

ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar(position="fill")

Position dodge is representing each category (here, continents), next to each other:

ggplot(data=gtf, mapping=aes(x=chr, fill=gene_type)) + 
  geom_bar(position="dodge")

More advanced (as reference, or if someone asks): how to reorder x-axis labels:

Factors are a data type in R: they are used to represent categorical data. Using factors requires a bit more understanding of R works/thinks, but here is an application:

ggplot(data=gtf, mapping=aes(x=factor(chr, levels=c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10"), ordered=TRUE), fill=gene_type)) + 
  geom_bar(position="dodge") +
  xlab("chromosome")

8.8.1 stats=“identity” parameter

geom_bar() can work a bit differently, when facing numbers instead of categories.

geom_bar()’s default behavior is to count the number of occurrences of each value found in x: it does not expect a y-value. the default is stat=“count”.

If you set stat parameter to “identity”, ggplot2 skips the aggregation and values used for the bars are provided by the user in x.

Let’s import data from file: DataViz_source_files-main/files/stats_continents_barcelona_2013-2023_long.csv into an object called statsbcn.

The data contains the number of foreign residents in Barcelona from 2013 to 2023.

statsbcn <- read_csv("DataViz_source_files-main/files/stats_continents_barcelona_2013-2023_long.csv")

How many rows and how many columns does the data contain?

In the barplots we created so far, R takes categories in the columns specified in x= and counts the number of occurrences.

The argument stat=“identity” in geom_bar() is telling R to calculate the sum of the variable specified in y=, grouped by the x variable: bars of the barplot will display the sums.

In the following example, we are plotting the sum of foreign residents in Barcelona (Population provided in y) per year (Year provided in x):

ggplot(statsbcn, aes(x=Year, y=Population)) + 
  geom_bar(stat="identity")

Here, we can provide Continent to fill:

ggplot(statsbcn, aes(x=Year, y=Population, fill=Continent)) + 
  geom_bar(stat="identity")

We can here again play with the position.

Position fill :

ggplot(statsbcn, aes(x=Year, y=Population, fill=Continent)) + 
  geom_bar(stat="identity", position="fill")

Position dodge :

ggplot(statsbcn, aes(x=Year, y=Population, fill=Continent)) + 
  geom_bar(stat="identity", position="dodge")

You can control the width of bars (hence, the spacing between 2 bars) using the width parameter of geom_bar():

ggplot(statsbcn, aes(x=Year, y=Population, fill=Continent)) + 
  geom_bar(stat="identity", position="dodge", width = 0.8)

More advanced (as reference, or if someone asks): display all labels:

Convert “Year” column as character, instead of numbers:

ggplot(statsbcn, aes(x=as.character(Year), y=Population, fill=Continent)) + 
  geom_bar(stat="identity", position="dodge")