20.1 Exploratory Data Analysis

We will use the package “car”. For details on this package, see https://cran.r-project.org/web/packages/car/car.pdf


Let’s explore the dataset “Davis” from the car package. It is called “Self-Reports of Height and Weight Description”. The subjects were men and women engaged in regular exercise.
This data frame contains the following columns:
- sex - F, female; M, male
- weight - measured weight in kg
- height - measured height in cm
- repwt - reported weight in kg
- repht - reported height in cm


20.1.1 Data dimentionality: functions str(), summary(), head(), tail()


20.1.2 Missing (NA) values in data: functions complete.cases(), na.omit(), all.equal()

How many rows do not contain missing values (i.e., not a single ‘NA’)?

Excercise using complete_cases(): How many rows contain missing values (i.e., at least one ‘NA’)?


20.1.4 Excercises on data subsetting and missing values

  1. How many people shorter than 170 cm reported that they are taller?
  1. What proportion of men in the dataset did not report their height? And women?
  1. Is it true that the same men who did not report height did not also report weight?


20.1.7 Excercises using unique(), table() and cut()

Let’s assume that a person with the minimum height, or == min(data$height), is a wrong entry in the dataset and exclude it from the analysis.

  1. How many unique values are there for the height?
  1. How many intervals for the height will be obtained at breaks of 10 cm from minimum to maximum height. Use min() and max() in function seq() and nlevels() – be careful to include maximum value for height in the last interval.
  1. How many women are in the last two intervals? (just by looking at the table)