A function to help graphical model checks of lm and ANOVA

As always a more colourful version of this post is available on rpubs.

Even if LM are very simple models at the basis of many more complex ones, LM still have some assumptions that if not met would render any interpretation from the models plainly wrong. In my field of research most people were taught about checking ANOVA assumptions using tests like Levene & co. This is however not the best way to check if my model meet its assumptions as p-values depend on the sample size, with small sample size we will almost never reject the null hypothesis while with big sample even small deviation will lead to significant p-values (discussion). As ANOVA and linear models are two different ways to look at the same model (explanation) we can check ANOVA assumptions using graphical check from a linear model. In R this is easily done using plot(model), but people often ask me what amount of deviation makes me reject a model. One easy way to see if the model checking graphs are off the charts is to simulate data from the model, fit the model to these newly simulated data and compare the graphical checks from the simulated data with the real data. If you cannot differentiate between the simulated and the real data then your model is fine, if you can then try again!

EDIT: You can also make the conclusion form your visual inspection more rigorous following the protocols outlined in this article. (Thanks for Tim comment)

Below is a little function that implement this idea:

lm.test<-function(m){
  require(plyr)
  #the model frame
  dat<-model.frame(m)
  #the model matrix
  f<-formula(m)
  modmat<-model.matrix(f,dat)
  #the standard deviation of the residuals
  sd.resid<-sd(resid(m))
  #sample size
  n<-dim(dat)[1]
  #get the right-hand side of the formula  
  #rhs<-all.vars(update(f, 0~.))
  #simulate 8 response vectors from model
  ys<-lapply(1:8,function(x) rnorm(n,modmat%*%coef(m),sd.resid))
  #refit the models
  ms<-llply(ys,function(y) lm(y~modmat[,-1]))
  #put the residuals and fitted values in a list
  df<-llply(ms,function(x) data.frame(Fitted=fitted(x),Resid=resid(x)))
  #select a random number from 2 to 8
  rnd<-sample(2:8,1)
  #put the original data into the list
  df<-c(df[1:(rnd-1)],list(data.frame(Fitted=fitted(m),Resid=resid(m))),df[rnd:8])

  #plot 
  par(mfrow=c(3,3))
  l_ply(df,function(x){
    plot(Resid~Fitted,x,xlab="Fitted",ylab="Residuals")
    abline(h=0,lwd=2,lty=2)
  })

  l_ply(df,function(x){
    qqnorm(x$Resid)
    qqline(x$Resid)
  })

  out<-list(Position=rnd)
  return(out)
}

This function print the two basic plots: one looking at the spread of the residuals around the fitted values, the other one look at the normality of the residuals. The function return the position of the real model in the 3×3 window, counting from left to right and from top to bottom (ie position 1 is upper left graph).

Let’s try the function:

#a simulated data frame of independent variables
dat<-data.frame(Temp=runif(100,0,20),Treatment=gl(n = 5,k = 20))
contrasts(dat$Treatment)<-"contr.sum"
#the model matrix
modmat<-model.matrix(~Temp*Treatment,data=dat)
#the coefficient
coeff<-rnorm(10,0,4)
#simulate response data
dat$Biomass<-rnorm(100,modmat%*%coeff,1)
#the model
m<-lm(Biomass~Temp*Treatment,dat)
#model check
chk<-lm.test(m)

Can you find which one is the real one? I could not, here is the answer:

chk
$Position
[1] 4

Happy and safe modelling!

6 thoughts on “A function to help graphical model checks of lm and ANOVA”

Pingback: Distilled News | Data Analytics & R
Tim says:

27 March 2015 at 8:34 AM

I would add a reference to your post: http://rsta.royalsocietypublishing.org/content/367/1906/4361

1. grumble10 says:
  
  27 March 2015 at 10:37 AM
  
  Thanks for the nice reference I edited the post, just quickly skimmed through the article, but seems very relevant to the approach.
nausicaa says:

23 September 2016 at 9:08 AM

Thanks, this is very helpful. I read your post about glms, but do you know of a similar function for mixed models ?

1. grumble10 says:
  
  24 September 2016 at 12:15 PM
  
  Thank you for your kind words, for mixed-effect model have a look at this package: https://cran.r-project.org/web/packages/DHARMa/index.html
Pingback: Making a case for hierarchical generalized models – biologyforfun

	grumble10 on Generating ANOVA-like table fr…
	Shauni Van Herck on Generating ANOVA-like table fr…
	Oliver on Adding standard errors for int…
	Jongwon on Help! I have convergence warni…
	Rohan on Checking (G)LM model assumptio…

A function to help graphical model checks of lm and ANOVA

Published by grumble10

6 thoughts on “A function to help graphical model checks of lm and ANOVA”

Leave a comment Cancel reply

Share this:

Related

Published by grumble10

6 thoughts on “A function to help graphical model checks of lm and ANOVA”

Leave a comment Cancel reply