rcode02

Displaying data: R code for Chapter 2 examples

Download the R code on this page as a single file here.

New methods on this page

Click on a function argument for a short description of its meaning. The variable names are plucked from the examples further below.

Read data from a comma-separated (.csv) file into a data frame. R will take columns having only numbers and make them numeric variables in the data frame. Columns having characters will be converted to variables, rather than to character variables.
<- read.csv()

Inspect data in a data frame. The head function shows the first few lines of the data frame to confirm the data was read properly. The nrow function indicates the number of cases (rows) in the data frame.
head()
nrow()

Basic strip chart relating a numeric variable to a categorical variable:
stripchart(, , ,
    )

Frequency table or contingency table:
table()

Bar graph for count data:
barplot(, , )

Histogram for a numeric variable:
hist(, )

Scatter plot showing association between two numeric variables:
plot(, )

Other new methods:
Multiple-histogram plot.
Line plot.
Load an R package.
Set the desired ordering of categories of a factor variable in tables and graphs.

Figure 2.1-2. Locust serotonin

Strip chart of serotonin levels in the central nervous system of desert locusts that were experimentally crowded for 0 (the control group), 1, and 2 hours.

Read the data and store in data frame (here named locustData). The following command uses read.csv to grab the data from a file on the internet (on the current web site).

locustData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_2locustSerotonin.csv"))

Show the first few lines of the data, to ensure it read correctly. Determine the number of cases in the data.

head(locustData)
nrow(locustData)

Draw a stripchart (the tilde "~" means that the first argument below is a formula, relating one variable to the other).

stripchart(serotoninLevel ~ treatmentTime, data = locustData, method = "jitter", 
	vertical = TRUE)

We show how to draw a fancier strip chart, closer to that shown in Figure 2.1-2, by including more options.


Figure 2.1-3. Education spending

A bar graph of education spending per student in different years in British Columbia.

Read the data into a data frame named educationSpending, and inspect.

educationSpending <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f1_3EducationSpending.csv"))
head(educationSpending)

Draw a bar graph. The names argument generates numbers between 1998 and 2004 in 1-year increments, to be used as labels along the horizontal axis.

barplot(educationSpending$spendingPerStudent, names.arg = educationSpending$year,
	ylab = "Education spending ($ per student)")

A slightly fancier bar graph like that in Figure 2.1-3, which includes additional options, is shown .


Example 2.2A. Deaths from tigers

Frequency table and bar graph showing activities of 88 people at the time they were killed by tigers near Chitwan National Park, Nepal, from 1979 to 2006.

Read the data into data frame named tigerData

tigerData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2aDeathsFromTigers.csv"))
head(tigerData)

Generate a frequency table. The sort function is included to sort the categories by their frequencies.

tigerTable <- sort(table(tigerData$activity), decreasing = TRUE)
tigerTable

You can arrange the frequency table vertically.

data.frame(Frequency = tigerTable)

Use the addmargins command to include sums in your frequency table.

data.frame(Frequency = addmargins(tigerTable))

Draw a bar graph. The additional arguments cex.names = 0.5 shrinks the axis labels by 50%, and las = 2 flips the labels, so that they all fit in the window.

barplot(tigerTable, ylab = "Frequency", cex.names = 0.5, las = 2)

A slightly fancier bar graph of the data with more options, like that shown in Figure 2.2-1, is shown .


Example 2.2B. Desert bird abundance

Frequency table and histogram illustrating the frequency distribution of bird species abundance at Organ Pipe Cactus National Monument.

Read and inspect the data.

birdAbundanceData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e2bDesertBirdAbundance.csv"))
head(birdAbundanceData)

Generate a frequency table of the numeric bird abundance variable. The option right = FALSE ensures that abundance value 300 (for example) is counted in the 300-400 bin rather than in the 200-300 bin.

birdAbundanceTable <- table(cut(birdAbundanceData$abundance, 
	breaks = seq(0,650,by=50), right = FALSE))
birdAbundanceTable

The same table oriented vertically and including the sum.

data.frame(Frequency = addmargins(birdAbundanceTable))

Draw a histogram of bird abundances.

hist(birdAbundanceData$abundance, right = FALSE)

Commands to draw a histogram with more options, such as Figure 2.2-3, are .


Figure 2.2-5. Salmon body size

Histograms of body mass of 228 female sockeye salmon sampled from Pick Creek in Alaska. The same data are plotted for three different interval widths: 0.1 kg, 0.3 kg, and 0.5 kg.

Read the data into data frame named salmonSizeData

salmonSizeData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02f2_5SalmonBodySize.csv"))
head(salmonSizeData)

Histograms with different bin widths (0.1, 0.3, and 0.5).

hist(salmonSizeData$massKg, right = FALSE, breaks = seq(1,4,by=0.1), col = "firebrick")
hist(salmonSizeData$massKg, right = FALSE, breaks = seq(1,4,by=0.3), col = "firebrick")
hist(salmonSizeData$massKg, right = FALSE, breaks = seq(1,4,by=0.5), col = "firebrick")

Example 2.3A. Bird malaria

Contingency table, grouped bar plot, and mosaic plot showing the association between egg removal treatment and incidence of malaria in female great tits.

Read the data from the data file and inspect.

birdMalariaData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3aBirdMalaria.csv"))
head(birdMalariaData)

Optional step: Set the desired order of treatment categories in tables and graphs. The factor command allows us to order the levels so that the category "Egg removal" comes before "Control" in tables and graphs (categories are otherwise ordered alphabetically).

birdMalariaData$treatment <- factor(birdMalariaData$treatment, 
	levels= c("Egg removal", "Control"))

Create a contingency table. Use table with the names of two categorical variables as arguments, beginning with the response variable.

birdMalariaTable <- table(birdMalariaData$response, birdMalariaData$treatment)
birdMalariaTable

Include row and column sums in the contingency table.

addmargins(birdMalariaTable, margin = c(1,2), FUN = sum, quiet = TRUE)

Draw a grouped bar graph using the contingency table.

barplot(as.matrix(birdMalariaTable), beside = TRUE)

Commands to produce a grouped bar graph with more options are shown .

Draw a mosaic plot. The "t" command flips (transposes) the table to ensure that the explanatory variable is along the horizontal axis.

mosaicplot( t(birdMalariaTable), col = c("firebrick", "goldenrod1"), 
	sub = "Treatment", ylab = "Relative frequency", 
	cex.axis = 1.1, main = "")

Example 2.3B. Guppy father and son comparison

Scatter plot of the relationship between the ornamentation of male guppies and the average attractiveness of their sons.

Read the data from the file.

guppyFatherSonData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3bGuppyFatherSonAttractiveness.csv"))
head(guppyFatherSonData)

Draw a scatter plot using the formula approach (with the "~").

plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData)

See for commands to draw a fancier scatter plot like that in Figure 2.3-3.


Example 2.3C. Human hemoglobin and elevation

Strip chart, box plot, and multiple histograms showing hemoglobin concentration in men living at high altitude in three different parts of the world and in a sea level population (USA).

Read the data.

hemoglobinData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e3cHumanHemoglobinElevation.csv"))
head(hemoglobinData)

Obtain the sample sizes in each of the four populations

table(hemoglobinData$population)

Draw a strip chart of the hemoglobin data.

stripchart(hemoglobin ~ population, data = hemoglobinData, method = "jitter",
    vertical = TRUE)

Commands for a fancier strip chart like that shown in Figure 2.3-4 are .

Draw a box plot, relating hemoglobin to population.

boxplot(hemoglobin ~ population, data = hemoglobinData)

A box plot with more options, like that in Figure 2.3-4, is shown here.

par(bty = "l")
boxplot(hemoglobin ~ population, data = hemoglobinData,
	col = "goldenrod1", boxwex = 0.5, whisklty = 1, outcol = "black", 
	outcex = 1, outlty = "blank", las = 1, 
	xlab="Male population", ylab = "Hemoglobin concentration (g/dL)")

Show the association between hemoglobin and population using the multiple histograms approach (Figure 2.3-5). The following commands use the lattice package, which must first be loaded.

library(lattice)
histogram(~ hemoglobin | population, data = hemoglobinData,
	layout = c(1,4), col = "firebrick", breaks = seq(10,26,by=1))

we show commands to draw the multiple histograms in basic R, without using the lattice package. This approach is more tedious, but the resulting graphs are often easier to modify.


Example 2.4A. Measles outbreaks

Line graph showing confirmed cases of measles in England and Wales from 1995 to 2011. Annual counts are quarterly.

Read the data from file.

measlesData <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter02/chap02e4aMeaslesOutbreaks.csv"))
head(measlesData)

Drawing a line graph uses the same command as a scatter plot but adds the type = "l" argument to draw a line graph instead.

plot(confirmedCases ~ yearByQuarter, data = measlesData, type="l") 

Commands to draw a fancier line plot (like that in Figure 2.4-1) are shown .

A factor is a special variable type in R that keeps track of known categories (“levels”) of a categorical variable.
The name you choose for the R data frame where the data from the file will be stored.
Name of a data file in comma-separated format. Replace this with the exact phrase file.choose() if instead you want a window to pop up that lets you select the file on your computer by point-and-click.
Name of the data frame.
Name of the data frame.
The "~" indicates that this is a formula relating a numeric response variable (serotoninLevel) on the y-axis to a categorical explanatory variable (treatmentTime) on the x-axis.
Indicates the name of the data frame containing the two variables.
Displaces the points a small random amount to reduce overlap.
For a vertical rather than horizontal plot.
The name of the categorical variable being tabulated. For a contingency table, include the names of two categorical variables, separated by comma.
Name of the frequency table or variable being plotted.
A label for the vertical axis.
Shrinks the x-axis labels by 50% so that they fit in the window.
The name of the numeric variable.
This option makes the histogram bins "right-open". This means that abundance value 300 (for example) is counted in the 300-400 bin rather than in the 200-300 bin.
A formula indicating the name of the variable to be plotted on the vertical axis (named on the left of the "~") and the variable to be plotted on the horizontal axis.
The name of the data frame containing the two variables.

# Stripchart with options
par(bty = "l") # plot x and y axes only, not a complete box
stripchart(serotoninLevel ~ treatmentTime, data = locustData, vertical = TRUE, 
	method = "jitter", pch = 16, col = "firebrick", cex = 1.5, las = 1,
	ylab = "Serotonin (pmoles)", xlab = "Treatment time (hours)",
	ylim = c(0, max(locustData$serotoninLevel)))
# The following command calculates the means in each treatment group (time)
meanSerotonin = tapply(locustData$serotoninLevel, locustData$treatmentTime, mean)
# "segments" draws draws lines to indicate the means
segments(x0 = c(1,2,3) - 0.1, y0 = meanSerotonin, x1 = c(1,2,3) + 0.1, 
	y1 = meanSerotonin, lwd = 2)


# Bar graph with more options
barplot(educationSpending$spendingPerStudent, names.arg = educationSpending$year, 
	las = 1, col = "firebrick", ylim = c(0,8000),
	ylab = "Education spending ($ per student)")


oldpar = par(no.readonly = TRUE) # stores a backup copy of current graph settings in "oldpar"
par(mar = c(8, 4, 4, 2) + 0.1)   # creates more room for labels below the x-axis
barplot(tigerTable, las = 2, col = "firebrick", cex.names = 0.8, ylim = c(0,50), 
	xlab = "", ylab = "Frequency (number of people)")
mtext("Activity", side = 1, line = 7, cex = 0.9) # adds the text under the x-axis
par(oldpar)  # reverts graph settings back to default


# Histogram with options
hist(birdAbundanceData$abundance, right = FALSE, breaks = seq(0,650,by=50),	
	col = "firebrick", las = 1, xlab = "Abundance (No. individuals)",  
	ylab = "Frequency (No. species)", main = "")


# Grouped bar graph with options
barplot(as.matrix(birdMalariaTable), beside = TRUE, space = c(0.1, 0.3),
	las = 1, xlab = "Treatment", ylab = "Frequency",
	col = c("firebrick", "goldenrod1"), legend.text = rownames(birdMalariaTable), 
	args.legend = list(x = 0.3, y = max(birdMalariaTable), xjust = 0))


# Scatter plot with options
plot(sonAttractiveness ~ fatherOrnamentation, data = guppyFatherSonData, 
	las = 1, pch = 16, col = "firebrick", cex = 1.5, bty = "l", 
	xlab = "Father's ornamentation", ylab = "Son's attractiveness")


par(bty = "l")
stripchart(hemoglobin ~ population, data = hemoglobinData, 
	vertical = TRUE, method = "jitter", jitter = 0.2, pch = 1, col = "firebrick",
	las = 1, xlab = "Male population", ylab = "Hemoglobin concentration (g/dL)")


# Multiple histograms using base graphics, plotting them one at a time.
# The "oma" option adjusts the outer margins of the whole figure
# The "mar" option tweaks the margins within each of the 4 plots.
oldpar = par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(4,1), oma = c(4, 6, 2, 6), mar = c(2, 5, 4, 2))
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Andes"],
	col = "firebrick", las = 1, main = "Andes",  
	breaks = seq(10,26,by=1), ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Ethiopia"],
	col = "firebrick", las = 1, main = "Ethiopia",  
	breaks = seq(10,26,by=1), ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "Tibet"],
	col = "firebrick", las = 1, main = "Tibet",  
	breaks = seq(10,26,by=1), ylab = "Frequency")
hist(hemoglobinData$hemoglobin[hemoglobinData$population == "USA"],
	col = "firebrick", las = 1, main = "USA",  
	breaks = seq(10,26,by=1), ylab = "Frequency")
mtext("Hemoglobin concentration (g/dL)", side = 1, outer = TRUE, padj = 1.5)
par(oldpar) # revert to default graph settings


# Line plot with options
plot(confirmedCases ~ yearByQuarter, data = measlesData, 
	type = "l", las = 1, lwd = 1, bty = "l", xlab = "Year",
	ylab = "Number of cases", xaxp = c(1995,2012,2012-1995))
points(confirmedCases ~ yearByQuarter, data = measlesData, pch = 16, 
	col = "firebrick", cex = 0.7)