3.2 Functions & packages

3.2.1 Functions

A function in R is a piece of code that takes an input (user data, parameters), processes some calculation, and outputs data.

For example: the mean() function would take a vector / series of numbers as an input, calculate and output their average.

Functions can take arguments/parameters. In the example above, the main argument to mean() would be a series of numbers given by the user.

In R code, you can recognize functions because of the parenthesis (“round brackets”) following their name.

3.2.2 Packages

3.2.2.1 What are packages?

A package in R stores, in standardized format, a set of functions, data and documentation.

They are developed and shared by the community, and vary in size and complexity.

Packages are stored in a library.

rstudio logo

source

Packages are usually found in public repositories such as:

  • CRAN (general repository for any type of data analysis).
  • Bioconductor (initially specialized in high throughput data analysis / bioinformatics)

Anyone can create a package and stored it locally; creating packages is a great way to share code.

The previous function, mean(), is part of the {base} package that is available by default.

3.2.2.2 The “tidyverse”

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

rstudio logo

source

Why do we use the tidyverse packages in this course?

  • Easier to understand / more intuitive vocabulary: better for beginners.
  • More “modern” style of coding.
  • Uniform in style and logic across data manipulation and visualization.

In this course, we will use in particular, and in that order:

  • {readr} for importing / exporting files.
  • {ggplot2} for data visualization.
  • {dplyr} for (simple) data manipulation and selection.