13.2 From wide to long format

The wide format is what you would typically have in a table with measurements, such as genes in rows and samples in columns.

However, we have seen that ggplot2 sometimes requires data to be converted to a long format.

In a long format, one row corresponds to one observation/measurement, with all information associated to it.

{tidyr} provides pivot_longer() to convert wide to long format, and pivot_wider() to convert a long to a wide format.

Our object geneexp is in a wide format. Two columns contain values of expression:

## # A tibble: 5 × 4
##   Gene  DE    sample1 sample2
##   <chr> <chr>   <dbl>   <dbl>
## 1 DKK1  No       9.06    5.27
## 2 TP53  No       3.57    8.55
## 3 BRCA1 No       7.39    8.24
## 4 AKT3  Down    15.1     1.57
## 5 CCND1 No       6.74   10.1

In order to convert to a long format we will create:

  • One column that contains the sample names
  • One column that contains the expression values
pivot_longer(geneexp, cols=c("sample1", "sample2"))
## # A tibble: 40 × 4
##    Gene  DE    name    value
##    <chr> <chr> <chr>   <dbl>
##  1 DKK1  No    sample1  9.06
##  2 DKK1  No    sample2  5.27
##  3 TP53  No    sample1  3.57
##  4 TP53  No    sample2  8.55
##  5 BRCA1 No    sample1  7.39
##  6 BRCA1 No    sample2  8.24
##  7 AKT3  Down  sample1 15.1 
##  8 AKT3  Down  sample2  1.57
##  9 CCND1 No    sample1  6.74
## 10 CCND1 No    sample2 10.1 
## # ℹ 30 more rows

We can specify the names of the new columns as we create them:

pivot_longer(geneexp, cols=c("sample1", "sample2"), names_to = "samples", values_to = "expression")
## # A tibble: 40 × 4
##    Gene  DE    samples expression
##    <chr> <chr> <chr>        <dbl>
##  1 DKK1  No    sample1       9.06
##  2 DKK1  No    sample2       5.27
##  3 TP53  No    sample1       3.57
##  4 TP53  No    sample2       8.55
##  5 BRCA1 No    sample1       7.39
##  6 BRCA1 No    sample2       8.24
##  7 AKT3  Down  sample1      15.1 
##  8 AKT3  Down  sample2       1.57
##  9 CCND1 No    sample1       6.74
## 10 CCND1 No    sample2      10.1 
## # ℹ 30 more rows