7.2 Import / read in data

7.2.1 Load package into environment

From RStudio interface, in the bottom-right panel and Packages tab, search for the package name and tick the box:

load readr

From the R console:

library(readr)

7.2.2 from CSV

Let’s now import the content of a first file in our environment.

There are several ways we can specify the path / location of a file:

  • Using the “absolute path”:
# absolute path
geneexp <- read_csv(file="~/Documents/DataViz_R/DataViz_source_files-main/files/expression_20genes.csv")
  • Using the “relative path” (i.e. relative to where the session and R project are currently located), e.g.:
# relative path (this assumes you are in the course folder)
geneexp <- read_csv(file="DataViz_source_files-main/files/expression_20genes.csv")

Because your working directory is DataViz_R, R can find the DataViz_source_files-main without needing the full path (relative vs absolute path).

The content of file expression_20genes.csv is now stored in the object named geneexp.

The function also outputs some information about the data you are importing:

import zip

Such as that:

  • The data contains 20 rows (observations), and 4 columns (variables).
  • Out of these 4 columns:
    • 2 contain characters (chr): Gene and DE.
    • 2 contain numbers (dbl for “double”): sample1 and sample2

Notes:

  • Objects you create can be found in the Environment tab in the upper-right panel.
  • If you click on an object name in the Environment tab, it will open on the upper-left panel. Let’s try with geneexp:

import env

7.2.3 from Excel

{tidyverse} provides the {readxl} package with functions to read in Excel files.

Although working with text files (.txt, .csv, .tsv etc.) is a better practice, you can import Excel files using the read_excel() function.

First, load the {readxl} package (bottom-right panel -> Packages -> search and tick readxl, or from the console, as shown below).

library(readxl)
# Relative path:
read_excel(path="DataViz_source_files-main/files/expression_20genes.xlsx")
## # A tibble: 20 × 4
##    Gene   DE    sample1 sample2
##    <chr>  <chr>   <dbl>   <dbl>
##  1 DKK1   No     9.06      5.27
##  2 TP53   No     3.57      8.55
##  3 BRCA1  No     7.39      8.24
##  4 AKT3   Down  15.1       1.57
##  5 CCND1  No     6.74     10.1 
##  6 AXL    No    13.5      16.6 
##  7 STAT3  Down  15.2       5.46
##  8 CCL1   No     5.28      7.09
##  9 TRAF2  No     8.93     12.9 
## 10 IL1R   No     8.46     15.3 
## 11 TAB2   No     9.76     14.6 
## 12 HPK1   Down  14.1       7.34
## 13 TLR8   Up     2.69     16.3 
## 14 TGFB   No     7.83     12.5 
## 15 STAT5  Down  18.6       9.21
## 16 ADAM17 Down  16.1      10.3 
## 17 PTEN   Up     0.0210   11.2 
## 18 SMRT   No    11.7      16.9 
## 19 DVL    No     4.33      6.84
## 20 MAPK2  Up     0.998     9.56

If your Excel file contains multiple sheets, you can specify the sheet name using the sheet= parameter:

read_excel(path="DataViz_source_files-main/files/expression_20genes.xlsx",
           sheet="tab1")
## # A tibble: 20 × 4
##    Gene   DE    sample1 sample2
##    <chr>  <chr>   <dbl>   <dbl>
##  1 DKK1   No     9.06      5.27
##  2 TP53   No     3.57      8.55
##  3 BRCA1  No     7.39      8.24
##  4 AKT3   Down  15.1       1.57
##  5 CCND1  No     6.74     10.1 
##  6 AXL    No    13.5      16.6 
##  7 STAT3  Down  15.2       5.46
##  8 CCL1   No     5.28      7.09
##  9 TRAF2  No     8.93     12.9 
## 10 IL1R   No     8.46     15.3 
## 11 TAB2   No     9.76     14.6 
## 12 HPK1   Down  14.1       7.34
## 13 TLR8   Up     2.69     16.3 
## 14 TGFB   No     7.83     12.5 
## 15 STAT5  Down  18.6       9.21
## 16 ADAM17 Down  16.1      10.3 
## 17 PTEN   Up     0.0210   11.2 
## 18 SMRT   No    11.7      16.9 
## 19 DVL    No     4.33      6.84
## 20 MAPK2  Up     0.998     9.56

Note: parameters in a function are comma-separated:

  • path is a first parameter
  • sheet is a second parameter