rcode12

Comparing two means: R code for Chapter 12 examples

Download the R code on this page as a single file here (make sure to install the “car” package before running).

New methods on this page

Click on a function argument for a short description of its meaning. The variable names are plucked from the examples further below.

Paired t-test and 95% confidence interval of a mean difference:
t.test(, , )

Two-sample t-test and 95% confidence interval for a difference between two means:
t.test(, , )

Other new methods:
The F test to compare two variances.
The 95% confidence interval for the variance ratio.
Levene’s test to compare variances between two (or more) groups.

Example 12.2. Red-winged blackbirds

Confidence interval for mean difference and the paired t-test, comparing immunocompetence of red-winged blackbirds before and after testosterone implants.

Read the data into a data frame. The data are in “wide” format (one row per individual).

blackbird <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter12/chap12e2BlackbirdTestosterone.csv"))
blackbird

Calculate and plot differences (Figure 12.2-2). We add a variable called d to the data frame with the After minus Before difference.

blackbird$d <- blackbird$logAfterImplant - blackbird$logBeforeImplant
head(blackbird)
hist(blackbird$d, right = FALSE, col = "firebrick")

To see how to produce the strip chart with lines (Figure 12.2-1) click .

95% confidence interval for the mean difference. 95% confidence intervals are part of the output of the t.test function, viewed in isolation by adding $conf.int to the command.

t.test(blackbird$d)$conf.int

or using the paired = TRUE argument of t.test and specifying the paired variables.

t.test(blackbird$logAfterImplant, blackbird$logBeforeImplant, paired = TRUE)$conf.int

Paired t-test. A paired t-test can be done either on differences you have already calculated (d here) or by using the paired=TRUE argument with the measurements from the two groups.

t.test(blackbird$d)

or

t.test(blackbird$logAfterImplant, blackbird$logBeforeImplant, paired = TRUE)

Example 12.3. Horned lizards

Confidence interval for the difference between two means, and the two-sample t-test, comparing horn length of live and dead (spiked) horned lizards.

lizard <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter12/chap12e3HornedLizards.csv"))
lizard

Note that there is one missing value for the variable "squamosalHornLength". Everything is easier if we eliminate the row with missing data.

lizard2 <- na.omit(lizard)
lizard2

Multiple histograms using the lattice package.

library(lattice)
histogram( ~ squamosalHornLength | Survival, data = lizard2,
	layout = c(1,2), col = "firebrick", breaks = seq(12, 32, by = 2),
	xlab = "Horn length (mm)")

Click for code to make multiple histograms using hist in base R instead.

95% confidence interval for the difference between means
The output of t.test includes the 95% confidence interval for the difference between means. Add $confint after calling the function to get R to report only the confidence interval. The formula in the following command tells R to compare squamosalHornLength between the two groups indicated by Survival.

t.test(squamosalHornLength ~ Survival, data = lizard, var.equal = TRUE)$conf.int

A two-sample t-test of the difference between two means can be carried out with t.test by using a formula, asking if squamosalHornLength is predicted by Survival, and specifying that the variables are in the data frame lizard.

t.test(squamosalHornLength ~ Survival, data = lizard, var.equal = TRUE)

Example 12.4. Salmon survival with brook trout

Welch's two-sample t-test for unequal variances, comparing chinook salmon survival in the presents and absence of brook trout. Below, we use this same example to demonstrate the 95% confidence interval for the ratio of two variances, and the F-test of equal variances.

Read the data.

chinook <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter12/chap12e4ChinookWithBrookTrout.csv"))

Set the preferred order of categories in tables and graphs

chinook$troutTreatment <- factor(chinook$troutTreatment, 
	levels = c("present", "absent"))

Strip chart of the data (bare bones version of Figure 12.4-1)

stripchart(proportionSurvived ~ troutTreatment, data = chinook, 
	method = "jitter", vertical = TRUE)

Adding the means and confidence intervals to the plot is trickier. Code for a fancier plot is given .

Calculating summary statistics by group (as found in Table 12.4-3)
Use tapply to calculate statistics by group. It has three required arguments. The first is the numeric variable of interest (a vector). The second argument is a categorical variable (a vector of the same length) identifying the groups that individuals belong to. The third argument is the name of the R function that you want to apply to the variable by group.

meanProportion <- tapply(chinook$proportionSurvived, chinook$troutTreatment, mean)
sdProportion <-   tapply(chinook$proportionSurvived, chinook$troutTreatment, sd)
nProportion <-    tapply(chinook$proportionSurvived, chinook$troutTreatment, length)
data.frame(mean = meanProportion, std.dev = sdProportion, n = nProportion)

Welch's two-sample t-test of means for unequal variances can also be done with t.test, when the var.equal argument is set to FALSE (as it is by default):

t.test(proportionSurvived ~ troutTreatment, data = chinook, var.equal = FALSE)


Here, we demonstrate the 95% confidence interval for the ratio of two variances, and F-test of equal variances, using the chinook salmon experiment.

95% confidence interval for variance ratio. (Warning: remember that the method is not robust to departures from assumption of normality.)

var.test(proportionSurvived ~ troutTreatment, data = chinook)$conf.int

F-test of equal variances (Warning: Remember that the F-test is not robust to departures from assumption of normality.)

var.test(proportionSurvived ~ troutTreatment, data = chinook)

Levene's test of equal variances. This function is in the car package, which must first be installed (see for instructions) and loaded with the library function before use.

library(car)
leveneTest(chinook$proportionSurvived, group = chinook$troutTreatment, 
	center = mean)
A numerical variable (vector) measured on all individuals.
A numerical variable (vector) measured on the same individuals as the previous variable.
This stipulates that the two variables you provided are paired (if left out, R carries out a two-sample test instead because paired = FALSE is the default).
The "~" indicates that this argument is a formula. squamosalHornLength is the response variable. Survival is the explanatory variable, indicating to which group each response measurement belongs (alive or dead in this example).
The name of the data frame containing the two variables.
Specifies that the variance is assumed to be equal in the two groups in an ordinary two-sample t-test. Set var.equal = FALSE (the default) if you want a Welch's t-test instead.

# It helps to obtain a version of the data in "long" format.
blackbird2 = reshape(blackbird, varying = 4:5, direction = "long", 
	 idvar = "blackbird", v.names = "logAntibody", 
	 times = factor(c("before","after"), levels = c("before","after")))
head(blackbird2)

par(bty="l")
stripchart(logAntibody ~ time, data = blackbird2, vertical = TRUE, 
	xlab = "Implant treatment", ylab="Antibody production rate (ln[mOD/min])",
	xlim = c(0.6,2.4), las = 1, pch = 16, col = "firebrick")
segments(1, blackbird$logBeforeImplant, 2, blackbird$logAfterImplant)


oldpar = par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(2,1), oma = c(4, 6, 2, 6), mar = c(3, 4, 3, 2))
hist(lizard2$squamosalHornLength[lizard2$Survival == "living"],
	breaks = seq(12,32,by=2), col = "firebrick", las = 1,  
	main = "living", ylab = "Frequency")
hist(lizard2$squamosalHornLength[lizard2$Survival == "killed"],
	breaks = seq(12,32,by=2), col = "firebrick", las = 1,  
	main = "killed", ylab = "Frequency")
mtext("Horn length (mm)", side = 1, outer = TRUE, padj = 0)
par(oldpar) # revert to default graph settings


# Calculate means and confidence intervals for the means.
meanProportion = tapply(chinook$proportionSurvived, chinook$troutTreatment, mean)
ciPresence = t.test(chinook$proportionSurvived[chinook$troutTreatment == "present"])$conf.int
ciAbsence = t.test(chinook$proportionSurvived[chinook$troutTreatment == "absent"])$conf.int
lower = c(ciPresence[1], ciAbsence[1])
upper = c(ciPresence[2], ciAbsence[2])

# Stripchart with options
adjustAmount = 0.2
par(bty = "l") 
stripchart(proportionSurvived ~ troutTreatment, data = chinook, vertical = TRUE, 
	method = "jitter", jitter = 0.1, pch = 1, col = "firebrick", cex = 1.5,
	las = 1, ylim = c(0, max(chinook$proportionSurvived)), lwd = 1.5,
	xlab = "Trout treatment", ylab = "Proportion surviving")
segments( c(1,2) + adjustAmount, lower, c(1,2) + adjustAmount, upper)
points(meanProportion ~ c( c(1,2) + adjustAmount ), pch = 16, cex = 1.2)


# Install the car package. This only need to be done once per computer. The following command
# downloads the package from the internet and configures it in your R folder. 
install.packages("car", dependencies = TRUE)