Handling violations of assumptions: R code for Chapter 13 examples

Download the R code on this page as a single file here.

New methods on this page

Click on a function argument for a short description of its meaning. The variable names are plucked from the examples further below.

Data transformations:
<- ( )

Quantile plots:
qqnorm(, )

Mann-Whitney U-test (R uses the equivalent Wilcoxon rank-sum test):
wilcox.test(, )

Other new methods:
Shapiro-Wilk test of normality.
Sign test.
Permutation test.

Example 13.1. Biomass in marine reserves

The normal quantile plot, Shapiro-Wilk test of normality, and the log transformation, investigating the ratio of biomass between marine reserves and non-reserve control areas.

Read and inspect the data.

marine <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter13/chap13e1MarineReserve.csv"))

Histogram of biomass ratio (Figure 13.1-4).

hist(marine$biomassRatio, right = FALSE, col = "firebrick", las = 1)

Normal quantile plot of biomass ratio data.

qqnorm(marine$biomassRatio, datax = TRUE)

Commands for a normal quantile plot with more options (Figure 13.1-4) are .

Shapiro-Wilk test of normality.


Natural log-transformation and resulting confidence interval of the a mean of marine biomass ratio.

Log-transformation. The function log() takes the natural logarithm of all the elements of a vector or variable. The following command puts the results into a new variable in the same data frame, marine.

marine$lnBiomassRatio <- log(marine$biomassRatio)

Histogram of the log-transformed marine biomass ratio (Figure 13.3-2) .

hist(marine$lnBiomassRatio, right = FALSE, col = "firebrick", las = 1)

95% confidence interval of the mean using the log-transformed data.


Back-transform lower and upper limits of confidence interval (exp is the inverse of the log function).

exp( t.test(marine$lnBiomassRatio)$conf.int )

Example 13.4. Sexual conflict and speciation

The sign test, comparing the numbers of species in 25 pairs of closely related insect taxa.

Read and inspect the data.

conflict <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter13/chap13e4SexualConflict.csv"))

Histogram of the difference in numbers of species (Figure 13.4-1).

hist(conflict$difference, right = FALSE, col = "firebrick",
	breaks = 50, xlab = "Difference in species number", las = 1)

Count up the frequency of differences that are below, equal to, and above zero. The first command below creates a new variable called zero in the data frame and sets all elements to "equal". This is just to get started. The next two commands overwrite those elements corresponding to taxon pairs with a difference greater than zero, or less than zero, respectively.

conflict$zero <- "equal"
conflict$zero[conflict$difference > 0] <- "above"
conflict$zero[conflict$difference < 0] <- "below"
conflict$zero <- factor(conflict$zero, levels = c("below", "equal", "above"))

Sign test. The sign test is just a binomial test. The result includes a confidence interval for the proportion using the Clopper-Pearson method, which isn't covered in the book.

binom.test(7, n = 25, p = 0.5)

Example 13.5. Cricket sexual cannibalism

The Wilcoxon rank-sum test (equivalent to the Mann-Whitney U-test) comparing times to mating (in hours) of starved and fed female sagebrush crickets. We also apply the permutation test to the same data.

Read and inspect data.

cannibalism <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter13/chap13e5SagebrushCrickets.csv"))

Multiple histograms (Figure 13.5-1) using the lattice package

histogram( ~ timeToMating | feedingStatus, data = cannibalism,
	layout = c(1,2), col = "firebrick", breaks = seq(0, 100, by = 20),
	type = "count", xlab = "Time to mating (hours)", ylab = "Frequency")

Commands for multiple histograms in basic R are .

Wilcoxon rank-sum test (equivalent to Mann-Whitney U-test)

wilcox.test(timeToMating ~ feedingStatus, data = cannibalism)

Permutation test of the difference between mean time to mating of starved and fed crickets.

Begin by calculating the observed difference between means (starved minus fed). The difference is -18.25734 in this data set.

cricketMeans <- tapply(cannibalism$timeToMating, cannibalism$feedingStatus, mean)
diffMeans <- cricketMeans[2] - cricketMeans[1]

Decide on the number of permutations.

nPerm <- 10000

Create a loop to permute the data many times (determined by nperm). In the loop, i is just a counter that goes from 1 to nPerm by 1; each permuted difference is saved in the permResult.

permResult <- vector() # initializes
for(i in 1:nPerm){
	# step 1: permute the times to mating
	permSample <- sample(cannibalism$timeToMating, replace = FALSE)
	# step 2: calculate difference betweeen means
	permMeans <- tapply(permSample, cannibalism$feedingStatus, mean)
	permResult[i] <- permMeans[2] - permMeans[1]

Plot null distribution based on the permuted differences (Figure 13.8-1).

hist(permResult, right = FALSE, breaks = 100)

Use the null distribution to calculate an approximate P-value. This is the twice the proportion of the permuted means that fall below the observed difference in means, diffMeans (-18.25734 in this example). The following code calculates the number of permuted means falling below diffMeans.

sum(as.numeric(permResult <= diffMeans))

These commands obtain the fraction of permuted means falling below diffMeans.

sum(as.numeric(permResult <= diffMeans)) / nPerm

Finally, multiply by 2 to get the P-value for a two-sided test.

2 * ( sum(as.numeric(permResult <= diffMeans)) / nPerm )
Creates a new variable named lnBiomassRatio in the data frame marine.
Takes the natural logarithm of all the elements in the data vector. Use asin(sqrt(data)) for the arcsine transformation and sqrt(data + 1/2) for the square root transformation.
The data vector (variable) to be transformed.
A vector of numerical variables; the data.
Puts the data values on the x-axis instead of the y-axis (the default).
A formula relating the response variable timeToMating to the explanatory variable feedingStatus.
The name of the data frame containing the variables.

qqnorm(marine$biomassRatio, datax = TRUE, pch = 16, col = "firebrick", 
	las = 1, ylab = "Biomass ratio", xlab = "Normal quantile", main = "")

oldpar = par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(2,1), oma = c(4, 6, 2, 6), mar = c(3, 4, 3, 2))
hist(cannibalism$timeToMating[cannibalism$feedingStatus == "starved"],
	col = "firebrick", las = 1, breaks = seq(0, 100, by = 20), 
	main = "starved", ylab = "Frequency")
hist(cannibalism$timeToMating[cannibalism$feedingStatus == "fed"],
	col = "firebrick", las = 1, breaks = seq(0, 100, by = 20), 
	main = "fed", ylab = "Frequency")
mtext("Time to mating (hours)", side = 1, outer = TRUE, padj = 0)
par(oldpar) # revert to default graph settings