9.9 Exercise 5. Data frame manipulation

Create the script “exercise5.R” and save it to the “Rcourse/Module1” directory: you will save all the commands of exercise 5 in that script.
Remember you can comment the code using #.

correction

9.9.1 Exercise 5a

1- Create the following data frame:

|43|181|M| |34|172|F| |22|189|M| |27|167|F|


With Row names: John, Jessica, Steve, Rachel.
And Column names: Age, Height, Sex.

correction

2- Check the structure of df with str().

correction

3- Calculate the average age and height in df

Try different approaches: * Calculate the average for each column separately.

correction

  • Calculate the average of both columns simultaneously using the apply() function.

correction

4- Add one row to df2: Georges who is 53 years old and 168 tall.

correction

5- Change the row names of df so the data becomes anonymous: Use Patient1, Patient2, etc. instead of actual names.

correction

6- Create the data frame df2 that is a subset of df which will contain only the female entries.

correction

7- Create the data frame df3 that is a subset of df which will contain only entries of males taller than 170.

correction

9.9.2 Exercise 5b

1. Create two data frames mydf1 and mydf2 as:

mydf1:

|1|14| |2|12| |3|15| |4|10|

mydf2:

|1|paul| |2|helen| |3|emily| |4|john| |5|mark|

With column names: “id”, “age” for mydf1, and “id”, “name” for mydf2.

correction

2- Merge mydf1 and mydf2 by their “id” column. Look for the help page of merge and/or Google it!

correction

3- Order mydf3 by decreasing age. Look for the help page of order.

correction

9.9.3 Exercise 5c

1- Using the download.file function, download this file to your current directory. (Right click on “this file” -> Copy link location to get the full path).

correction

2- The function dir() lists the files and directories present in the current directory: check if genes_dataframe.RData was copied.

correction

3- Load genes_dataframe.RData in your environment Use the load function.

correction

4- genes_dataframe.RData contains the df_genes object: is it now present in your environment?

correction

5- Explore df_genes and see what it contains You can use a variety of functions: str, head, tail, dim, colnames, rownames, class…

correction

6- Select rows for which pvalue_KOvsWT < 0.05 AND log2FoldChange_KOvsWT > 0.5. Store in the up object.

correction

How many rows (genes) were selected?

7- Select from the up object the Zinc finger protein coding genes (i.e. the gene symbol starts with Zfp). Use the grep() function.

correction

8- Select rows for which pvalue_KOvsWT < 0.05 AND log2FoldChange_KOvsWT is > 0.5 OR < -0.5. For the selection of log2FoldChange: give the abs function a try!
Store in the diff_genes object.

correction

How many rows (genes) were selected?