21.2 From wide to long format

The wide format is what you would typically have in a table with measurements, such as genes in rows and samples in columns.

However, we have seen that ggplot2 sometimes requires data to be converted to a long format.

In a long format, one row corresponds to one observation/measurement, with all information associated to it.

{tidyr} provides pivot_longer() to convert wide to long format, and pivot_wider() to convert a long to a wide format.

Our object geneexp is in a wide format. Several columns contain values of expression:

## # A tibble: 5 × 17
##   GeneSymbol log2FoldChange    padj DE      WT1   WT2   WT3   WT4   WT5   KO1   KO2   KO3   KO4
##   <chr>               <dbl>   <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Adh7                0.715 0.00337 UP     8.33  7.25  8.22  7.44  7.87  8.88  8.78  8.74  8.54
## 2 Cdk5r1             -0.564 0.0364  DOWN   7.96  7.41  8.21  8.07  8.00  7.31  6.87  7.31  7.24
## 3 Bcat1               0.702 0.0128  UP    10.0   8.90  8.83  9.01  9.29 10.2  10.1  10.0  10.1 
## 4 Asb1                0.598 0.0147  UP     7.49  7.22  6.69  7.60  6.93  7.71  7.50  8.24  7.81
## 5 AU020206            0.545 0.0104  UP     7.66  7.71  7.13  7.69  7.42  8.08  8.41  7.87  8.23
## # ℹ 4 more variables: KO5 <dbl>, KO6 <dbl>, KO7 <dbl>, KO8 <dbl>

In order to convert to a long format we will pivot the data frame (specifically, the columns that contain expression data) and create:

  • One column that contains the sample names
  • One column that contains all expression values
pivot_longer(geneexp, cols=matches("WT|KO"))

We can specify the names of the new columns as we create them:

pivot_longer(geneexp, cols=matches("WT|KO"), 
             names_to = "samples", 
             values_to = "expression")