9.6 From wide to long format
The wide format is what you would typically have in a table with measurements, such as genes in rows and samples in columns.
However, we have seen that ggplot2 requires data to be converted to a long format.
In a long format, one row corresponds to one observation/measurement, with all information associated to it.
{tidyr} provides pivot_longer() to convert wide to long format, and pivot_wider() to convert a long to a wide format.
Our object geneexp is in a wide format. Two columns contain values of expression:
## # A tibble: 5 × 4
## Gene DE sample1 sample2
## <chr> <chr> <dbl> <dbl>
## 1 DKK1 No 9.06 5.27
## 2 TP53 No 3.57 8.55
## 3 BRCA1 No 7.39 8.24
## 4 AKT3 Down 15.1 1.57
## 5 CCND1 No 6.74 10.1
In order to convert to a long format we will create:
- One column that contains the sample names
- One column that contains the expression values
## # A tibble: 40 × 4
## Gene DE name value
## <chr> <chr> <chr> <dbl>
## 1 DKK1 No sample1 9.06
## 2 DKK1 No sample2 5.27
## 3 TP53 No sample1 3.57
## 4 TP53 No sample2 8.55
## 5 BRCA1 No sample1 7.39
## 6 BRCA1 No sample2 8.24
## 7 AKT3 Down sample1 15.1
## 8 AKT3 Down sample2 1.57
## 9 CCND1 No sample1 6.74
## 10 CCND1 No sample2 10.1
## # ℹ 30 more rows
We can specify the names of the new columns as we create them:
## # A tibble: 40 × 4
## Gene DE samples expression
## <chr> <chr> <chr> <dbl>
## 1 DKK1 No sample1 9.06
## 2 DKK1 No sample2 5.27
## 3 TP53 No sample1 3.57
## 4 TP53 No sample2 8.55
## 5 BRCA1 No sample1 7.39
## 6 BRCA1 No sample2 8.24
## 7 AKT3 Down sample1 15.1
## 8 AKT3 Down sample2 1.57
## 9 CCND1 No sample1 6.74
## 10 CCND1 No sample2 10.1
## # ℹ 30 more rows