There are many ways to display data. The fundamental idea is that the graphical depiction of data should communicate the truth the data has to offer about the situation of interest.

### Histograms

1 Quantitative Variable

#### Overview

Great for showing the distribution of data for a single quantitative variable when the sample size is large. Dotplots are a good alternative for smaller sample sizes. Gives a good feel for the mean and standard deviation of the data.

#### Explanation

Histograms group data that are close to each other into “bins” (the vertical bars in the plot). The height of a bin is determined by the number of data points that are contained within the bin. For example, if we group together all the sections of the book of scripture known as the Doctrine and Covenants that occurred in a given year (Jan. 1st - Dec. 31st) then we get the following counts.

Year Number of Sections
1823 1
1824 0
1825 0
1826 0
1827 0
1828 1
1829 16
1830 19
1831 37
1832 16
1833 12
1834 5
1835 3
1836 4
1837 1
1838 8
1839 3
1840 0
1841 3
1842 2
1843 4
1844 1
1845 0
1846 0
1847 1

*Note that Section 138 occurred in 1918 and is removed from this example.

In this example, each “bin” spans 365 days (Jan. 1 - Dec. 31 of each year). Since “dates” can be used as quantitative data, it makes sense to make a histogram of these data. (Remember, histograms are only for quantitative data.)

Notice in the bins above that the left edge of the bin is on the year the data corresponds with. The right edge of the bin lands on the following year. For example, the first bin has left edge on 1823 and right edge on 1824. Since there was one revelation in 1823, this bin has a height of 1. The bin that has 1831 on the left and 1832 on the right shows that 37 revelations occurred in 1831. It is powerful to notice the amount of revelations occurring around 1830, the year the Church of Jesus Christ of Latter-day Saints was organized.

### Boxplots

1 Quantitative Variable | 2+ Groups

#### Overview

Graphical depiction of the five-number summary. Great for comparing the distributions of data across several groups or categories. Provides a quick visual understanding of the location of the median as well as the range of the data. Can be useful in showing outliers. Sample size should be larger than at least five, or computing the five-number summary is not very meaningful. Side-by-side dotplots are a good alternative for smaller sample sizes.

#### Explanation

Understanding how a boxplot is created is the best way to understand what the boxplot shows.

1. The five-number summary is computed.
2. A box is drawn with one edge located at the first quartile and the opposite edge located at the third quartile.
3. This box is then divided into two boxes by placing another line inside the box at the location of the median.
4. The maximum value and minimum value are marked on the plot.
5. Whiskers are drawn from the first quartile out towards the minimum and from the third quartile out towards the maximum.
6. If the minimum or maximum is too far away, then the whisker is ended early.
7. Any points beyond the line ending the whisker are marked on the plot as dots. This helps identify possible outliers in the data.

### Scatterplots

2 Quantitative Variables

#### Overview

Depicts the actual values of the data points, which are $$(x,y)$$ pairs. Works well for small or large sample sizes. Visualizes well the correlation between the two variables. Should be used in linear regression contexts whenever possible.

#### R Instructions

To make a scatterplot in R using the ggplot approach, first ensure:

library(ggplot2)

ggplot(data, aes(x=dataColumn1, y=dataColumn2) +

geom_point()

• data is the name of your dataset.
• dataColumn1 is a column of data from your dataset that is quantitative and will be used as the explanatory variable.
• dataColumn2 is a column of data from your dataset that is quantitative and will be used as the response variable.
• The aesthetic helper function aes(x= , y=) is how you tell the gpplot to make the x-axis have the values in your dataColumn1 of data, the y-axis become your dataColumn2.
• The geometry helper function geom_point() causes the ggplot to become a scatterplot.

Example Code

Click to view. Hover to learn.

ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the + sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=Wind,  “x=” declares which variable will become the x-axis of the graphic, the explanatory variable. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function.
)
Closing parenthsis for the ggplot function.
+  The addition symbol + is used to add further elements to the ggplot.
geom_point( The “geom_point()” function causes the ggplot to become a scatterplot. There are many other “geom_” functions that could be used. )
Closing parenthsis for the geom_point function.

Press Enter to run the code.
…  Click to View Output.

ggplot An R function “ggplot” used to create a framework for a graphic that will have elements added to it with the + sign. ( Parenthesis to begin the function. Must touch the last letter of the function. airquality “airquality” is a dataset. Type “View(airquality)” in R to see it. The comma allows us to specify optional commands to the function. The space after the comma is not required. It just looks nice. aes( The aes or “aesthetics” function allows you to tell the ggplot how it should appear. This includes things like what the x-axis or y-axis should become. x=Wind,  “x=” declares which variable will become the x-axis of the graphic, the explanatory variable. y=Temp “y=” declares which variable will become the y-axis of the graphic. )
Closing parenthsis for the aes function.
)
Closing parenthsis for the ggplot function.
+  The addition symbol + is used to add further elements to the ggplot.
geom_point( The “geom_point()” function causes the ggplot to become a scatterplot. There are many other “geom_” functions that could be used. color = “ivory3”,  Controls the color of the dots. pch = 18 Controls the type of plotting character to be used in the plot. )
Closing parenthsis for the geom_point function.
+  The addition symbol + is used to add further elements to the ggplot.
labs( The “labs” function is used to add labels to the plot, like a main title, x-label and y-label. title=“La Guardia Airport (May - Sep)”,  The “title=” command allows you to control the main title at the top of the graphic. x=“Daily Average Wind Speed (mph)”,  The “x=” command allows you to control the x-label of the graphic. y=“Daily Mean Temperature” The “y=” command allows you to control the y-label of the graphic. )
Closing parenthsis for the labs function.
+  The addition symbol + is used to add further elements to the ggplot.
theme_bw()
Changes the “theme” or look of the plot to “black” and “white”.

Press Enter to run the code.
…  Click to View Output.

To make a scatterplot in R using the plotly approach, first ensure:

library(plotly)

plot_ly(data, x= ~dataColumn1, y= ~dataColumn2)

• data is the name of your dataset.
• dataColumn1 is a column of data from your dataset that is quantitative and will be used as the explanatory variable.
• dataColumn2 is a column of data from your dataset that is quantitative and will be used as the response variable.

Example Code

plot_ly(airquality, x= ~Wind, y= ~Temp)
plot_ly(KidsFeet,
x= ~length,
y= ~width,
color= ~sex,
size= ~birthmonth,
text= ~paste("Name:", name, "\n", "Birth-Month:", birthmonth),
colors=c("skyblue","hotpink")) %>%
layout(title="KidsFeet dataset",
xaxis=list(title="Length of the longer foot in cm"),
yaxis=list(title="Width of the longer foot in cm"))

### Custom Plots

Creativity Required

#### Overview

Sometimes no standard plot sufficiently describes the data. In these cases, the only guideline is the one stated originally, “the graphical depiction of data should communicate the truth the data has to offer about the situation of interest.”

#### R Examples

plot(density(CO2$uptake[CO2$Type=="Quebec"]),
lines(density(CO2$uptake[CO2$Type=="Mississippi"]),
col='firebrick')