11.3 Exercise 6.
Create the script “exercise6.R” and save it to the “Rcourse/Module2” directory: you will save all the commands of exercise 6 in that script.
Remember you can comment the code using #.
11.3.1 Exercise 6a. Input / output
1- Download folder “i_o_files” in your current directory with:
# system invokes the OS command specified by the "command" argument.
system(command="svn export https://github.com/sarahbonnin/CRG_RIntroduction/trunk/i_o_files")
All files that will be used for exercise 6 are found in the i_o_files folder !
2- Read in the content of ex6a_input.txt using the scan command; save in object z
How many elements are in z?
correction
3- Sort z: save sorted vector in object “zsorted”.
correction
4- Write zsorted content into file ex6a_output.txt.
correction
5- Check the file you produced in the RStudio file browser (click on the file in bottom-right panel “Files” tab). Save the content of zsorted again but this time setting the argument “ncolumns” to 1: how is the file different?
11.3.2 Exercise 6b - I/O on data frame: play with the arguments of read.table
1- field separator
- Read ex6b_IO_commas_noheader.txt in object fs. What are the dimensions of fs?
correction
- Fields/columns are separated by commas: change the default value of the “sep” argument and read in the file again. What are now the dimensions of fs?
correction
2- field separator + header
- Read ex6b_IO_commas_header.txt in object fs_c. What are the dimensions of fs_c ?
- Check head(fs_c) and change the default field separator to an appropriate one.
- The first row should to be the header (column names): change the default value of the header parameter and read in the file again. What are now the dimensions of fs_c ?
3- skipping lines
- Read ex6b_IO_skip.txt in object sk.
Is R complaining ?
Check “manually” the file (in the R Studio file browser).
- The skip argument allows you to ignore one or more line(s) before reading in a file. Introduce this argument with the appropriate number of lines to skip, and read the file again.
Is R still complaining? What are now the dimensions of sk ?
Change the default field separator. What are now the dimensions of sk ?
4- Comment lines
- Read ex6b_IO_comment.txt in object cl.
Is R complaining again ? Check manually the file and try to find out what is wrong…
What os the comment.char argument used for ? Adjust the comment.char argument and read the file again.
- Adjust also the header and sep arguments to read in the file correctly. What are now the dimensions of cl?
correction
4- final
- Read ex6b_IO_final.txt in object fin.
- Adjust the appropriate parameters according to what you have learnt, in order to obtain the data frame “fin” of dimensions 167 x 4.
11.3.3 Exercice 6c - I/O on a data frame
1- Read in file ex6c_input.txt in ex6 object
Warning: the file has a header !
Check the structure of ex6 (remember the str command).
2- Now read in the same file but, this time, set the argument as.is to TRUE.
Check again the structure: what has changed ?
3- What are the column names of ex6 ?
correction
4- Change the name of the first column of ex6 from “State” to “Country”.
correction
5- How many countries are in the Eurozone, according to ex6 ?
Remember the table function.
correction
6- In the Eurozone column: change “TRUE” with “yes” and “FALSE” with “no”.
correction
# select the Eurozone column
ex6$Eurozone
# elements of the Eurozone column that are exactly TRUE
ex6$Eurozone==TRUE
# extract actual values that are TRUE
ex6$Eurozone[ex6$Eurozone==TRUE]
# reassign all elements that are TRUE with "yes"
ex6$Eurozone[ex6$Eurozone==TRUE] <- "yes"
# same with FALSE
ex6$Eurozone[ex6$Eurozone==FALSE] <- "no"
7- In the column Country: how many country names from the list contain the letter “c” (capital- or lower-case) ?
Remember the grep function. Check the help page.
correction
8- According to that data frame: how many people live: + in the European union (whole table) ? + in the Eurozone ?
correction
# sum the whole population column
sum(ex6$Population)
# select elements of ex6 where Eurozone is "yes"
ex6$Eurozone == "yes"
# select only elements in Population for which the corresponding Eurozone elements are "yes"
ex6$Population[ex6$Eurozone == "yes"]
# sum that selection
sum(ex6$Population[ex6$Eurozone == "yes"])
9- Write ex6 into file ex6c_output.txt
After each of the following steps, check the output file in the RStudio file browser (lower-right panel).
- Try with the default arguments.
correction
- Add the argument “row.names” set to FALSE.
- Add the argument “quote” set to FALSE.
- Add the argument “sep” set to “ or to”,"