“For the things we have to learn before we can do them, we learn by doing them.”

― Aristotle, The Nicomachean Ethics

Getting Started

Hover your mouse here to begin. Good work!
This book requires that you interact with it to learn. Hovering is the first step.

<- The Assignment Operator

Being able to save your work is important!

Usage    Keyboard Shortcut: Alt -

NameYouCreate <- some R commands

• <- (Less than symbol < with a hyphen -) is called the assignment operator and lets you store the results of the some R commands into an object called NameYouCreate.
• NameYouCreate is any name that begins with a letter, but can use numbers, periods, and underscores thereafter. To use spaces in the name, you must use your Name encased in back-ticks, but this is not recommended.

Example Code

cars2 First we name the object we are creating. In this case, we are making a copy of the cars dataset, so it is logical to call it cars2, but it could be bob, c2 or any name you wanted to use. Just be careful to not use names that are already in use!   <-   The <- assignment operator will take whatever is on the right hand side and save it into the name written on the left hand side. cars In this case the cars dataset is being copied to cars2 so that we can change cars2 without changing the original cars dataset.
Press Enter to run the code.

cars2 The new copy of the cars dataset that we just created $ftpersec The $ selection operator can be used to create a new column in a dataset when used with the <- assignment operator.  <-  The <- assignment operator will take the results of the right-hand-side and save them into the name on the left-hand-side. cars2$speed * 5280 / 3600 This calculation converts the miles per hour of the cars2 speed column into feet per seconds because there are 5280 feet in a mile and 60 minutes in an hour and 60 seconds in a minute. View(cars2) The cars2 dataset now contains a 3rd column called feetpersec. Compare this to the original cars dataset to see how it changed. Click to Show Output Click to View Output. c( ) The Combine Function table( ) This is a way to quickly count how many times each value occurs in a column or columns. Usage table(NameOfDataset$columnName)

table(NameOfDataset$columnName1, NameOfDataset$columnName2)

• The table( ) function counts how many times each value in a column of data occurs.
• NameOfDataset is the ane of a data set, like cars or airquality or KidsFeet.
• columnName is the name of a column from the data set.
• columnName1 and columnName2 are two different names of columns from the data set.

Example Code

speedCounts <-
speedCounts is a new object being created using the assignment operator <- that will contain the counts of how many times each “speed” occurs in the cars data set speed column.
table( The table function table( ) is being used in this case to count how many times each speed occurs in the cars data set speed column. cars This is the name of the data set. $The$ is used to access a given column from the data set. speed This is the name of the column we are interested in from the cars data set. ) Always close off your functions in R with a closing parathesis.
speedCounts Typing the name of an object will print the results to the screen.
Press Enter to run the code.
Click to Show Output  Click to View Output.

library(mosaic) library(mosaic) is needed to access the KidsFeet data set that is used in this example. If you don’t have the mosaic library, you will need to run install.packages("mosaic") to install it first. From then on, you can open mosaic to use it with the command library(mosaic). You need only install packages once. You must library them each time you wish to use them.
birthdays <-
birthdays is a new object being created using the assignment operator <- that will contain the counts of how many birthdays occur in each month for each gender in the KidsFeet dataset.
table( The table function table( ) is being used in this case to count how many birthdays occur in each month for children of each gender. KidsFeet This is the name of the data set. $The$ is used to access a given column from the data set. sex This is the name of the column we are interested in becoming the rows of our final table. Comma separating the two columns of the data set you want to table. KidsFeet This is the name of the data set. $The$ is used to access a given column from the data set. birthmonth This is the name of the column we are interested in becoming the columns of our final table. ) Always close off your functions in R with a closing parathesis.
birthdays Typing the name of an object will print the results to the screen.
Press Enter to run the code.
Click to Show Output  Click to View Output.

select( )

Used to select out certain columns from a dataset.

Usage

select(NameOfDataset, listOfColumnNames)

• select( ) is the function that selects out certain columns of the dataset.
• NameOfDataset is the name of a dataset, like cars or airquality or KidsFeet.
• listOfColumnNames is a vector of names of columns from the dataset, usually supplied inside a combine c(...) statement.

Example Code

KidsNameBirth <-  KidsNameBirth is a name we made up. The assignment operator <- will save the reduced version of the KidsFeet dataset created by the select(...) function into this name. select(KidsFeet,   “select” is a function from library(tidyverse) that selects out specified columns from the original dataset in the order specified. c(name, birthyear, birthmonth) The columns of the KidsFeet dataset that we want to select out of the original dataset. Notice how the concatenation function c(...) is used to list out the columns we want. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

KidsBigLength <-  KidsBigLength is a name we made up. The assignment operator <- will save the reduced version of the KidsFeet dataset created by the select(...) function into this name. select(KidsFeet,   “select” is a function from library(tidyverse) that selects out specified columns from the original dataset in the order specified. c(biggerfoot, length) The columns of the KidsFeet dataset that we want to select out of the original dataset. The order in which columns are selected is the order in which they are placed in the new data set. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

%>% The Pipe Operator

Just like the pipes in your kitchen sink, the pipe operator takes “water from the sink” and “sends it down to somewhere else.”

Usage    Keyboard Shortcut: Ctrl Shift M

NameOfDataset %>%

some R commands that follow on the next line

• %>%, the pipe operator, is created by typing percent symbols % on both sides of a greater than symbol >. It lets you take whatever is on the left of the symbol and “pipe it down into” some R commands that follow on the next line.
• NameOfDataset is the name of a dataset, like cars or airquality or KidsFeet.

Note: you should load library(tidyverse) before using the %>% operator.

Example Code

Kids2 <-  This provides a name for the new reduced version of the KidsFeet dataset that is going to be created by the combined use of filter(...) and select(...). KidsFeet KidsFeet is a dataset found in library(mosaic). Click on this code to View the dataset and the resulting Kids2 dataset.  %>%  The pipe operator that will send the KidsFeet dataset down inside of the code on the following line.
filter( “filter” is a function from library(tidyverse) that allows us to reduce the number of rows in the KidsFeet dataset by filtering according to certain criteria. birthyear Represents the column of data that we want to use to reduce the rows of the dataset.  == 87 This is the “filtering rule”. It will filter the data down to just those children who had a birthyear equal to 87. ) Always close off your functions in R with a closing parathesis.  %>%  The pipe operator that will send the filtered version of the KidsFeet dataset down inside of the code on the following line.
select( “select” is a function from library(tidyverse) that selects out specified columns from the current dataset in the order specified. c(name, birthyear, length) The columns of the filtered KidsFeet dataset that we want to select. Notice how the concatenation function c(...) is used to list out the columns we want. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

summarise( ) and group_by( )

Compute numerical summaries on data or on groupings within the data.

Usage

NameofDataset %>%

summarise(nameYouLike = some_stats_function(columnName))

OR

NameofDataset %>%

group_by(columnGroupsName) %>%

summarise(nameYouLike = some_stats_function(columnName))

• NameOfDataset is the name of a dataset, like cars or airquality or KidsFeet.
• %>% is the pipe operator that “pipes data” down into R commands on the next line.
• group_by(...) is an R function from library(tidyverse) that groups data according to a specified column (or columns).
• summarise(...) is an R function from library(tidyverse) that computes numerical summaries on data or groups of data.
• columnGroupsName is the name of a column that represents qualitative (categorical) data. This column is used to separate the dataset into little datasets, one “little dataset” for each group or category in the columnGroupsName column.
• nameYouLike is just that. Some name you come up with.
• some_stats_function(...) is a stats function like mean(...), sd(...), n(...) or so on.
• columnName is the name of a column from the dataset that you want to compute numerical summaries on.

Example Code

KidsFeet KidsFeet is a dataset found in library(mosaic).  %>%  The pipe operator that will send the KidsFeet dataset down inside of the code on the following line.
summarise( “summarise” is a function from library(tidyverse) that allows us to compute numerical summaries on data. aveLength A name we came up with that will store the results of the numerical summary.  = mean(length) This computes the mean(...) of the length column from the KidsFeet dataset. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

KidsFeet KidsFeet is a dataset found in library(mosaic).  %>%  The pipe operator that will send the KidsFeet dataset down inside of the code on the following line.
summarise( “summarise” is a function from library(tidyverse) that allows us to compute numerical summaries on data. aveLength A name we came up with that will store the results of the numerical summary.  = mean(length),  This computes the mean(...) of the length column from the KidsFeet dataset.
sdLength A name we came up with that will store the results of the numerical summary.  = sd(length),  This computes the sd(...) of the length column from the KidsFeet dataset.
sampleSize A name we came up with that will store the results of the numerical summary.  = n( ) This computes the n(...), or sample size, of the length column from the KidsFeet dataset. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

KidsFeet KidsFeet is a dataset found in library(mosaic).  %>%  The pipe operator that will send the KidsFeet dataset down inside of the code on the following line.
group_by( “group_by” is a function from library(tidyverse) that allows us to split the dataset up into “little groups” according to the column specified. sex “sex” is a column from the KidsFeet dataset that records the gender of each child. ) Always close off your functions in R with a closing parathesis.  %>%  The pipe operator that will send the grouped according to gender version of the KidsFeet dataset down inside of the code on the following line.
summarise( “summarise” is a function from library(tidyverse) that allows us to compute numerical summaries on data. aveLength A name we came up with that will store the results of the numerical summary.  = mean(length),  This computes the mean(...) of the length column from the KidsFeet dataset.
sdLength A name we came up with that will store the results of the numerical summary.  = sd(length),  This computes the sd(...) of the length column from the KidsFeet dataset.
sampleSize A name we came up with that will store the results of the numerical summary.  = n( ) This computes the n(...), or sample size, of the length column from the KidsFeet dataset. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

For more uses of summarise(...) and group_by(...) see the Example codes on the various “R Instructions” of the Numerical Summaries page.

arrange( )

Arrange data by a certain column, or columns, i.e. “sort” the data.

Usage

NameofDataset %>%

arrange(columnName1)

Note: arrange(columnName1, columnName2, ...) is also possible.

• NameOfDataset is the name of a dataset, like cars or airquality or KidsFeet.
• %>% is the pipe operator that “pipes data” down into R commands on the next line.
• arrange(...) is an R function from library(tidyverse) that arranges a data set by order for the column given.
• columnName1 is the name of a column from the dataset that you want to compute numerical summaries on.
• columnName2 is the name of a column from the dataset that you want to compute numerical summaries on.
• ... implies that you can arrange by as many columns as you want.

Example Code

KidsFeet KidsFeet is a dataset found in library(mosaic).  %>%  The pipe operator that will send the KidsFeet dataset down inside of the code on the following line.
arrange( “arrange” is an R function from library(tidyverse) that arranges a data set by order for the column given. birthmonth birthmonth is the name of one of the columns of the KidsFeet data set. Specifying this name will cause the data to be sorted by birthmonth from 1 to 12. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

KidsFeet KidsFeet is a dataset found in library(mosaic).  %>%  The pipe operator that will send the KidsFeet dataset down inside of the code on the following line.
arrange( “arrange” is an R function from library(tidyverse) that arranges a data set by order for the column given. desc( This causes the arranging to be done in descending order (highest to lowest). birthmonth birthmonth is the name of one of the columns of the KidsFeet data set. Specifying this name will cause the data to be sorted by birthmonth from 1 to 12. ) Always close off your functions in R with a closing parathesis. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

pander( )

Makes output of most commands “beautiful”.

Usage

library(pander) then…

pander(someCode)

OR

someCode %>%

pander( )

Note: pander(stuff, caption="Some useful caption", ...) is also possible.

• someCode is exactly that, some coding you have done that creates output that you want displayed nicely.
• %>% is the pipe operator that “pipes data” down into R commands on the next line.
• pander(...) is an R function from library(pander) that makes most R output look nice.
• ... other useful commands like split.table=Inf.

Example Code

pander( pander is an R function that makes output look nice. table(KidsFeet$sex, KidsFeet$birthmonth), Code that makes a table of how many boys and girls were born in each month of the year.   caption=“Counts of Birthdays by Month” The caption=" " command is very useful for giving your output a small title. ) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.

KidsFeet KidsFeet is a dataset found in library(mosaic).  %>%  The pipe operator that will send the KidsFeet dataset down inside of the code on the following line.
group_by( “group_by” is a function from library(tidyverse) that allows us to split the dataset up into “little groups” according to the column specified. sex “sex” is a column from the KidsFeet dataset that records the gender of each child. ) Always close off your functions in R with a closing parathesis.  %>%  The pipe operator that will send the grouped according to gender version of the KidsFeet dataset down inside of the code on the following line.
summarise( “summarise” is a function from library(tidyverse) that allows us to compute numerical summaries on data. aveLength A name we came up with that will store the results of the numerical summary.  = mean(length),  This computes the mean(...) of the length column from the KidsFeet dataset.
sdLength A name we came up with that will store the results of the numerical summary.  = sd(length),  This computes the sd(...) of the length column from the KidsFeet dataset.
sampleSize A name we came up with that will store the results of the numerical summary.  = n( ) This computes the n(...), or sample size, of the length column from the KidsFeet dataset. ) Always close off your functions in R with a closing parathesis.  %>%  The pipe operator that will send the KidsFeet dataset down inside of the code on the following line.
pander( The pander function will make the output of the above code look nice. caption=“Doesn’t that look nice?”) Always close off your functions in R with a closing parathesis.
Press Enter to run the code.
Click to Show Output  Click to View Output.