[UPDATE: I modified a bit the code of the function, now you do not need to pass as character the random effect terms]

[UPDATE 2: I added some lines to pass glmer.nb models to the functions, be aware that passing such models to the function will take quite some computing time]

[UPDATE 3: you can find an up-to-date version of the function here]

This article may also be found on RPubs: http://rpubs.com/hughes/63269

In the list of worst to best way to test for effect in GLMM the list on http://glmm.wikidot.com/faq state that parametric bootstrapping is among the best options. PBmodcomp in the pbkrtest package implement such parametric bootstrapping by comparing a full model to a null one. The function simulate data (the response vector) from the null model then fit these data to the null and full model and derive a likelihood ratio test for each of the simulated data. Then we can compare the observed likelihood ratio test to the null distribution generated from the many simulation and derive a p-value. The advantage of using such a method over the classical p-values derived from a chi-square test on the likelihood ratio test is that in the parametric bootstrap we do not assume any null distribution (like chi-square) but instead derive our own null distribution from the model and the data at hand. We do not make the assumption then that the likelihood ratio test statistic is chi-square distributed. I have made a little function that wraps around the PBmodcomp function to compute bootstrapped p-values for each term in a model by sequentially adding them. This lead to anova-like table that are typically obtained when one use the command anova on a glm object.

#the libraries used
library(lme4)
library(arm)
library(pbkrtest)
#the function with the following arguments
#@model the merMod model fitted by lmer or glmer
#@w a logical indicating whether a binomial model was used or not. Note that the binomial model should be fitted using the weights argument.
#@seed you can set a seed to find back the same results after bootstrapping
#@nsim the number of bootstrapped simulation, if set to 0 return the Chisq p-value from the LRT test
#@fixed should the variable be dropped based on the order given in the model (TRUE) or the the dropping goes from worst to best variable (FALSE)
anova_merMod<-function(model,w=FALSE,seed=round(runif(1,0,100),0),nsim=0,fixed=TRUE){
  require(pbkrtest)
  data<-model@frame
  if(w){
    weight<-data[,which(names(data)=="(weights)")]
    data<-data[,-which(names(data)=="(weights)")]
  }
  f<-formula(model)
  resp<-as.character(f)[2]
  rand<-lme4:::findbars(f)
  rand<-lapply(rand,function(x) as.character(x))
  rand<-lapply(rand,function(x) paste("(",x[2],x[1],x[3],")",sep=" "))
  rand<-paste(rand,collapse=" + ")
  
  #generate a list of reduced model formula
  fs<-list()
  fs[[1]]<-as.formula(paste(resp,"~ 1 +",rand))
  nb_terms<-length(attr(terms(model),"term.labels"))
  if(nb_terms>1){
    #the next two line will make that the terms in the formula will add first the most important term, and the least important one at the end 
    #going first through the interactions and then through the main effects
    mat<-data.frame(term=attr(terms(model),"term.labels"),SSQ=anova(model)[,3],stringsAsFactors = FALSE)
    mat_inter<-mat[grep(":",mat$term),]
    mat_main<-mat[!rownames(mat)%in%rownames(mat_inter),]
    if(!fixed){
      mat_main<-mat_main[do.call(order,list(-mat_main$SSQ)),]
      mat_inter<-mat_inter[do.call(order,list(-mat_inter$SSQ)),]
      mat<-rbind(mat_main,mat_inter)
    }   
    for(i in 1:nb_terms){
      tmp<-c(mat[1:i,1],rand)
      fs[[i+1]]<-reformulate(tmp,response=resp)
    }      
  }
  else{
    mat<-data.frame(term=attr(terms(model),"term.labels"),stringsAsFactors = FALSE)
  }

  #fit the reduced model to the data

  
  fam<-family(model)[1]$family
  
  if(fam=="gaussian"){
    m_fit<-lapply(fs,function(x) lmer(x,data,REML=FALSE))
  }
  else if(fam=="binomial"){
    data$weight <- weight
    m_fit<-lapply(fs,function(x) glmer(x,data,family=fam,weights=weight))
  }
  else if("Negative Binomial" %in% substr(fam,1,17)){
    m_fit<-lapply(fs,function(x) glmer.nb(x,data))
  }
  else{
    m_fit<-lapply(fs,function(x) glmer(x,data,family=fam))
  }

  if(nb_terms==1){
    if(fam=="gaussian"){
      m_fit[[2]]<-lmer(formula(model),data,REML=FALSE)
    }
    else if(fam=="binomial"){
      data$weight <- weight
      m_fit[[2]]<-glmer(formula(model),data,family=fam,weights=weight)
    }
    else if("Negative Binomial" %in% substr(fam,1,17)){
      m_fit[[2]]<-glmer.nb(formula(model),data)
    }
    else{
      m_fit[[2]]<-glmer(formula(model),data,family=fam)
    }
  }
  #compare nested model with one another and get LRT values (ie increase in the likelihood of the models as parameters are added)
  tab_out<-NULL
  
  for(i in 1:(length(m_fit)-1)){
    if(nsim>0){
      comp<-PBmodcomp(m_fit[[i+1]],m_fit[[i]],seed=seed,nsim=nsim)    
      term_added<-mat[i,1]
      #here are reported the bootstrapped p-values, ie not assuming any parametric distribution like chi-square to the LRT values generated under the null model
      #these p-values represent the number of time the simulated LRT value (under null model) are larger than the observe one
      tmp<-data.frame(term=term_added,LRT=comp$test$stat[1],p_value=comp$test$p.value[2])
      tab_out<-rbind(tab_out,tmp)
      print(paste("Variable ",term_added," tested",sep=""))
    }
    else{
      comp<-anova(m_fit[[i+1]],m_fit[[i]])
      term_added<-mat[i,1]
      tmp<-data.frame(term=term_added,LRT=comp[2,6],p_value=comp[2,8])
      tab_out<-rbind(tab_out,tmp)
    }
    
  }  
  print(paste("Seed set to:",seed))
  return(tab_out)  
}

You pass your GLMM model to the function together with the random part as character (see example below), if you fitted a binomial GLMM you also need to provide the weights as a vector, you can then set a seed and the last argument is the number of simulation to do, it is set by default to 50 for rapid checking purpose but if you want to report these results larger values (ie 1000, 10000) should be used.

Let’s look at a simple LMM example:

data(grouseticks)
m<-lmer(TICKS~cHEIGHT+YEAR+(1|BROOD),grouseticks)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: TICKS ~ cHEIGHT + YEAR + (1 | BROOD)
##    Data: grouseticks
## 
## REML criterion at convergence: 2755
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.406 -0.246 -0.036  0.146  5.807 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  BROOD    (Intercept) 87.3     9.34    
##  Residual             28.1     5.30    
## Number of obs: 403, groups:  BROOD, 118
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   5.4947     1.6238    3.38
## cHEIGHT      -0.1045     0.0264   -3.95
## YEAR96        4.1910     2.2424    1.87
## YEAR97       -4.3304     2.2708   -1.91
## 
## Correlation of Fixed Effects:
##         (Intr) cHEIGH YEAR96
## cHEIGHT -0.091              
## YEAR96  -0.726  0.088       
## YEAR97  -0.714  0.052  0.518
anova_merMod(model=m,nsim=100)
## [1] "Variable cHEIGHT tested"
## [1] "Variable YEAR tested"
## [1] "Seed set to: 23"
##     term      LRT    p_value
## 1 cHEIGHT 14.55054 0.00990099
## 2    YEAR 14.40440 0.00990099

The resulting table show for each term in the model the likelihood ratio test, which is basically the decrease in deviance when going from the null to the full model and the p value, you may look at the PBtest line in the details of ?PBmodcomp to see how it is computed.

Now let’s see how to use the function with binomial GLMM:

#simulate some binomial data
x1<-runif(100,-2,2)
x2<-runif(100,-2,2)
group<-gl(n = 20,k = 5)
rnd.eff<-rnorm(20,mean=0,sd=1.5)
p<-1+0.5*x1-2*x2+rnd.eff[group]+rnorm(100,0,0.3)
y<-rbinom(n = 100,size = 10,prob = invlogit(p))
prop<-y/10
#fit a model
m<-glmer(prop~x1+x2+(1|group),family="binomial",weights = rep(10,100))
summary(m)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: prop ~ x1 + x2 + (1 | group)
## Weights: rep(10, 100)
## 
##      AIC      BIC   logLik deviance df.resid 
##    288.6    299.1   -140.3    280.6       96 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -2.334 -0.503  0.181  0.580  2.466 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 1.38     1.18    
## Number of obs: 100, groups:  group, 20
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(&gt;|z|)    
## (Intercept)    0.748      0.287    2.61   0.0092 ** 
## x1             0.524      0.104    5.02  5.3e-07 ***
## x2            -2.083      0.143  -14.56  &lt; 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##    (Intr) x1    
## x1  0.090       
## x2 -0.205 -0.345</code></pre>
#takes some time
anova_merMod(m,w = rep(10,100),nsim=100)</pre>
## [1] "Variable x1 tested"
## [1] "Variable x2 tested"
## [1] "Seed set to: 98"
##   term      LRT p_value
## 1   x1   0.0429 0.80392
## 2   x2 502.0921 0.01961

For binomial model, the model must be fitted with proportion data and a vector of weights (ie the number of binomial trial), in addition the w argument to the function must be set to TRUE. Some warning message may pop up at the end of the function, this comes from convergence failure in PBmodcomp, this do not affect the results, you may read the article from the pbkrtest package: http://www.jstatsoft.org/v59/i09/ to understand better where this comes from.

Happy modeling and as Ben Bolker say: “When all else fails, don’t forget to keep p-values in perspective: http://www.phdcomics.com/comics/archive.php?comicid=905

20 thoughts on “Generating ANOVA-like table from GLMM using parametric bootstrap

  1. Hi! I really like this code! Do you see any reason which would prevent adding glmer.nb (negative binomial)? I use a lot of count models with over-dispersion and I’d like to use your code for that!

  2. Also, unless I’m wrong, the function fails if you put only one fixed effect:

    For instance,

    Cluster.T<-lmer(Scabbed ~ Traitement +(1 | Bloc),data= TableauClusterShort))

    anova_merMod(model=Cluster.T,rand="(1|Bloc)")

    Error in m_fit[[i + 1]] : subscript out of bounds

    So determining effect of treatment (compared to NULL), fails.

    1. Well spotted mistake, I modified the code in the article, should work now. Thanks for trying out the code, I hope it’ll help you, I’ll look into glmer.nb later on, should be relatively straightforward to implement it, which package do you use?

    2. Hi! I will try the update version later today. For NB, you can add this code:

      else if(“Negative Binomial”%in%fam){
      m_fit<-lapply(fs,function(x) glmer.nb(x,data))
      }

      This works with lme4. My question was to determine if it's "appropriate" and valid for glmm in general?

    3. Cool, I did not know that lme4 had a NB GLMM, theoretically parametric bootstrapping is applicable to any model types, usually we do not use it for simpler models (GLM), as the the likelihood ratio test follow chi-square distributions, in the case of GLMM this is not always the case. I added your modification to the code, somehow when I try it, it fails to find a funciton called sFun, I’ll look at this later on.

  3. By the way, I tried to use the function “mixed” from the “afex” package =

    mixed(formula,data,method=”KR”) #works
    mixed(formula,data,method=”LRT”) #works
    mixed(formula,data,method=”PB”,args.test = list(nsim = 10),family=negative.binomial(8)) # which seems to be what you are doing (?) = throws “could not find function “sfun””.

    1. Cool package, never used it. The method=”PB” is indeed close to what the function I posted does. There is however a difference, mixed report type 3 p-values when anova_merMod report type 1. I used type 1 because I wanted to have a simple extension of what anova.glm does, which is basically adding terms in order and computing the drop in deviance from a model to the next. Which is basically what is done with anova_merMod. Concerning the sfun issue the error come when PBmodcomp tries to simulate new responses from the NB model. There is apparently some internal machinerie still missing in glmer.nb …

  4. This code is fantastic! Reading the comments section I tried to change it for negative binomial but couldn’t get it to run for glm.nb or glmmadmb which I actually need it for. Were you ever able to make it run with negative binomial?

    1. Hi Ciara, Thanks for your kind words! If you use glm.nb have a look at ?anova.negbin, this should gives you what you want. If you use glmer.nb it seems that PBmodcomp works now for such models, you could then try the anova_merMod function. Do note that it will take quite some time to run and you might get quite some convergence warnings along the way, also the results must be taken very carefully. The function anova_merMod will not work for glmmadmb models. Hope this helps.

  5. This code is great! I could very well be missing something here in the way LRTs should be used, but, could the code be used for other families such as poisson and quasi-poisson. Thanks

    1. Hi Simon, Thanks for your kind words. So far the function (should) work for every lmer, glmer and glmer.nb models that are out there. So Poisson models will work but quasi-Poisson models are not implemented in lme4, you could use glmer.nb instead.

  6. When running this with a glmer with binomial distribution, I’m getting an error saying “Error in if (w) { : argument is not interpretable as logical”. Is ‘w’ expecting a logical or a vector? Currently weights a vector of values corresponding to the size of the population from which the proportions were calculated.

    If I run it using a logical value, e.g. anova_merMod(glmer_midg_01, w = TRUE, nsim = 50) I get an error saying ‘weight’ is not found.

    Any suggestions?

    1. Thanks Charles for your interest. It was indeed an issue with the function. What happens is that glmer is looking for the weight variable only as a column with the same name in the data, I do not know if this is a bug or a feature of glmer … I updated the post, you can also head over to the github for the function: https://github.com/lionel68/Blog/tree/master/anova_merMod. Trying this out with the cbpp dataset worked for me. Let me know if you still have some issues.

  7. Thank you so much for this code! I ran into it when looking for a solution to get p-values from a robust mixed model (function ‘rlmer’ using package ‘robustlmm’, Koller 2016). The older version of anova_merMod (which I found on Rpubs) works on my data (even though throwing me some convergence warnings), but the updated version shown here doesn’t work anymore, because anova can’t be applied on class rlmerMod. How bad is it to keep working with the previous version, or else is there a way to get your function working for class rlmerMod?

    1. Hi Shauni,
      Thanks for your interest in this old piece of code. I’ll have to look at the robustlmm package but if the authors did not implement an anova method for the rlmerMod object this will be tricky. Maybe you could ask the package developper to implement an anova method? Normally the function should work once this is done …

Leave a comment