21.2 From wide to long format
The wide format is what you would typically have in a table with measurements, such as genes in rows and samples in columns.
However, we have seen that ggplot2 sometimes requires data to be converted to a long format.
In a long format, one row corresponds to one observation/measurement, with all information associated to it.
{tidyr} provides pivot_longer() to convert wide to long format, and pivot_wider() to convert a long to a wide format.
Our object geneexp is in a wide format. Several columns contain values of expression:
## # A tibble: 5 × 17
## GeneSymbol log2FoldChange padj DE WT1 WT2 WT3 WT4 WT5 KO1 KO2 KO3 KO4
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Adh7 0.715 0.00337 UP 8.33 7.25 8.22 7.44 7.87 8.88 8.78 8.74 8.54
## 2 Cdk5r1 -0.564 0.0364 DOWN 7.96 7.41 8.21 8.07 8.00 7.31 6.87 7.31 7.24
## 3 Bcat1 0.702 0.0128 UP 10.0 8.90 8.83 9.01 9.29 10.2 10.1 10.0 10.1
## 4 Asb1 0.598 0.0147 UP 7.49 7.22 6.69 7.60 6.93 7.71 7.50 8.24 7.81
## 5 AU020206 0.545 0.0104 UP 7.66 7.71 7.13 7.69 7.42 8.08 8.41 7.87 8.23
## # ℹ 4 more variables: KO5 <dbl>, KO6 <dbl>, KO7 <dbl>, KO8 <dbl>
In order to convert to a long format we will pivot the data frame (specifically, the columns that contain expression data) and create:
- One column that contains the sample names
- One column that contains all expression values
We can specify the names of the new columns as we create them: