Sometime I happen to be wrong, this is one of these instance. The issue: a colleague measured individual plant growth and measured light irradiation received by each individual, the plants where in groups of 10 individuals and he measured soil parameters at the group-level. To analyze the effect of light on plant growth while controlling for soil and group variation, he built a mixed-effect model with group as a random term and light plus soil as fixed effect. As soil was measured at the group-level all individuals within a group got the same soil value.

When he came and asked if what he was doing was correct, I said that he had a classical multilevel dataset and that it was not possible to fit it using lme4, since the model would not know how to differentiate between variation explained by group effects and variation explained by soil effects. Actually this is possible, provided that the variable measured at the group-level is continuous (see this old post on a similar issue), one may add group-level predictors in lme4 fits. Let’s see how it works:

```#load libraries
library(lme4)
library(ggplot2)
library(gridExtra)

#simulate some data

#first case one group-level predictor
#and one individual level no interaction

#group-level predictor
soil <- runif(20,-2,2)
#individual level predictor
light <- runif(200,-2,2)
#group-level regression
grp_eff <- rnorm(20,2*soil,1)
#group index
group <- gl(n = 20,k = 10,
labels = letters[1:20])
#expected individual values
linpred <- 2 + grp_eff[group] + 2*light
#simulated individual values
growth <- rnorm(200,linpred,1)
#putting it together
dat <- data.frame(Group=group,Light=light,
Soil=soil[group],Growth=growth)

#model
(m <-lmer(Growth~Light+Soil+(1|Group),dat))

Linear mixed model fit by REML ['lmerMod']
Formula: Growth ~ Light + Soil + (1 | Group)
Data: dat
REML criterion at convergence: 608.1304
Random effects:
Groups Name Std.Dev.
Group (Intercept) 0.6773
Residual 1.0046
Number of obs: 200, groups: Group, 20
Fixed Effects:
(Intercept) Light Soil
2.354 2.042 2.122
```

The parameter estimates looks pretty good, closed to the values used to simulate the data. It works! One can easily add group-level regression in lme4 provided that the group-level predictor(s) are continuous.
Now we can make nice plots of the group and individual-level regressions.

```#gather group-level deviation values
#in one dataframe together with deviation
rf <- ranef(m,condVar=TRUE)
rff <- data.frame(Group=letters[1:20],Soil=soil,RF=rf\$Group[,1],Var=attr(rf\$Group,"postVar")[,,1:20])
#compute the group level regression
rff\$RFF <- with(rff,RF+fixef(m)*Soil)
#plot
p1 <- ggplot(rff,aes(x=Soil,y=RFF,ymin=RFF-2*Var,ymax=RFF+2*Var))+geom_point()+
geom_abline(aes(intercept=0,slope=fixef(m)))+geom_linerange()+
labs(y="Group-level effect",title="Group-level regression")

#predict individual-level regression
#for average group and average soil
newdat <- expand.grid(Light=seq(-2,2,length=20),Soil=0)
newdat\$Pred <- predict(m,newdata=newdat,re.form=~0)
#plot
p2 <- ggplot(dat,aes(x=Light,y=Growth))+geom_point()+
geom_line(data=newdat,aes(x=Light,y=Pred))+labs(title="Individual-level regression")

grid.arrange(p1,p2,ncol=2)``` A word of caution now on using group-level regressions: I would be very careful (or better avoid) using p-values from such models. The degrees of freedom to test the soil effect are clearly inflated and are hard to guess, so I would advise against significance testing on the group-level regression.
We can of course expand this approach to more complex situation (see some examples here), let’s see an example where we had an interaction between the group and the individual-level predictors:

```#add interaction between group and individual-level
#and unbalanced design

#group index
group <- factor(sample(letters[1:20],
size = 200,replace=TRUE))
grp_eff <- rnorm(20,2*soil,1)

#expected individual values
linpred <- 2 + grp_eff[group] + 2 * light -  (grp_eff[group] * light)
#simulated individual values
growth <- rnorm(200,linpred,1)
#putting it together
dat <- data.frame(Group=group,Light=light,
Soil=soil[group],Growth=growth)

#model
(m <-lmer(Growth~Light*Soil+(1|Group),dat))

Linear mixed model fit by REML ['lmerMod']
Formula: Growth ~ Light * Soil + (1 | Group)
Data: dat
REML criterion at convergence: 731.4661
Random effects:
Groups   Name        Std.Dev.
Group    (Intercept) 0.5474
Residual             1.4182
Number of obs: 200, groups:  Group, 20
Fixed Effects:
(Intercept)        Light         Soil   Light:Soil
2.138        1.961        1.844       -1.865

```

This is where it starts to get tricky for me, the estimate of the interaction between group and individual-level predictors is twice as big as the simulated value. Maybe I am doing something wrong in the simulation part …
Anyhow we can still do some cool plots:

```#group-level plot
rf <- ranef(m,condVar=TRUE)
rff <- data.frame(Group=letters[1:20],Soil=soil,RF=rf\$Group[,1],Var=attr(rf\$Group,"postVar")[,,1:20])
#compute the group level regression
rff\$RFF <- with(rff,RF+2.16*Soil)
p1 <- ggplot(rff,aes(x=Soil,y=RFF,ymin=RFF-2*Var,ymax=RFF+2*Var))+geom_point()+
geom_abline(aes(intercept=0,slope=2.16))+geom_linerange()+
labs(y="Group-level effect",title="Group-level regression")

#predict individual effect for average group and average soil
newdat <- expand.grid(Light=seq(-2,2,length=20),Soil=c(-2,0,2))
newdat\$Pred <- predict(m,newdata=newdat,re.form=~0)

p2 <- ggplot(dat,aes(x=Light,y=Growth))+geom_point()+
geom_line(data=newdat,aes(x=Light,y=Pred,color=factor(Soil)))+
labs(title="Individual-level regression")+
scale_color_discrete(name="Soil")

grid.arrange(p1,p2,ncol=2)

``` That’s it, one does not need to use more complex modelling technique like maximum likelihood estimation or multilevel bayesian regression, one can use rather straightforward lme4 fits for exploring group-level regressions.
A must-read for anyone interested in hierarchical/multilevel models is the Gelman and Hill book.

## One thought on “Adding group-level predictors in GLMM using lme4”

1. Murilo says:

I was exactly looking for such situation: including or not group-level predictors in a lmer model. But I’m still uncomfortable with lmer guts: if a random intercept model removes intergroup variability and send it to the random part of the model, why does it still successfully estimate an effect for the group-level predictor??? Since for me it seems conceptually impossible to account for the random intercept effect and yet detect a group predictor effect. Could please provide any further advise on it? I would really appreciate it. Cheers