After all the hard work of collecting the data, thinking about appropriate models, formatting the data, you are finally running your model, this is it you are going to get the long awaited results and BOUM you get out such kind of message:
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.52673
## (tol = 0.001, component 17)
What does it means? Is it bad? And what can you do about it?
What is convergence?
Only few models have an exact solution that can be automatically derived from the data, one such case is the simple linear model (fitted via lm in R). For these models there are equations providing all the answer you need such as parameter estimates, p-values and so on just from the data at hand (see this wiki page). For all other models such as GLMs or GLMMs, an algorithm must be run to find the estimates of interest. In essence it can be compared to asking a blind dog (the algorithm) to find the highest possible point (maximum likelihood) in a defined landscape (parameter space) and in a limited amount of time (number of iterations). In some cases where there is a clear slope and only one hill the dog is able to find the goal in just a few iterations. In other cases where the likelihood landscape is formed of several hills with large flat areas in-between those hills, the poor blind dog will run around but after time as run out it will report to you that it could not find the highest possible point and will give you a convergence warning.
Is it bad?
In short, yes very much. You cannot trust the parameter estimates from models that did not converged and even less other derived quantities such as standard errors or p-values. So no need to do residuals checks or plot the results, you should try your luck again this mode cannot be trusted. That said package developer have to set thresholds (sometimes also called tolerance) separating a model that converged from a model that did not. Usually the developer will tend to be more conservatives to be certain that inference drawn from the models can be trusted. But sometimes the model at hand is very close to the threshold, for instance say that the metric tracking convergence is 2e-6 and the threshold is 1e-6, in that case it should not be too hard to reach convergence by little tweaks.
What can I do?
Below is a non-exhaustive of what can be done to try to reach convergence for problematic models, I ordered this list based on the steps that I tend to follow when encountering convergence problems, you may also want to check this page with additional infos for convergence issues with lme4:
- standardize all predictors: if the explanatory variables in the models have very different scales, like one variables goes from -100 up to 10000 and another one from 0.001 to 0.01, this makes the parameter space very complex to navigate for the algorithm. By standardizing the predictors everything is on the same scale, see this article [link] for more reasons to standardize your predictors.
- Try different distributions: would it make sense to try a Poisson distribution? Are your data over or underdispersed? Usually we have pretty good idea of what distribution the data should follow, but convergence warnings may be alleviated either by using simpler distributions, such as using a Gaussian distribution to approximate Poisson distributed data when the mean is large (say larger than 10). Also sometime the fitted distribution expect particular mean-variance relations that may just be wrong in the data at hand, there trying more general distributions such as the one provided in the glmmTMB package may solve the convergence problem.
- restart the model from parameter estimate reached before the algorithm gave up: by default the algorithm is starting at random parameter values, these might be pretty far away from the point of maximum likelihood. So the idea is to use the parameter values from a previous fit that failed as starting points, in lme4 this is done via the following code:
ss <- getME(model_1,c("theta","fixef"))
model_2 <- update(model_1,start=ss)
- think deeply about the model complexity: these days it is so easy to fit very complex models without realizing the dazzling complexity hidden under the hood. Take a step back and think: do you really need this nested random effects? Can you drop this 5-way interactions? Realize that the model you can fit depends on the amount and the quality of your data. This is why it is usually god to think about the model you will fit when designing the data collection. With the protocol that you will follow will you be able to fit the models you want and extract the signal you are after?
- try different algorithm: usually there are more than one algorithm available for a specific model type to get to the answer, lme4 for instance offer by default 2 algorithms Nelder-Mead and bobyqa
- run the algorithm with higher number of iterations: if no solution was found in the first 1000 iteration of the algorithm, maybe 10000 will work better. But be careful with model running time, you do not want to wait for weeks.
- Gather more data: model complexity should grow with sample size and/or data quality, if there is only a vague signal in your data the likelihood surface will be pretty hard to navigate for the algorithm. Higher signal / noise ratio will help
- Go Bayesian: models fitted with classical statistics only use information from the data, so this means that the algorithm only gets guidance as to where it should go to find the point of highest likelihood from the data. Yet most of the time we could provide some extra guidance to the algorithm in the form of prior informations of likely values for the model parameter. For instance, in a logistic regression with standardized predictors it is unlikely that the slope parameter will be smaller than -5 or larger than 5 (run “curve(invlogit(5*x),-2,2)” in R after loading the arm package). In Bayesian Data Analysis weakly informative priors are used to prevent the algorithm to wander into parameter spaces where the values make little sense but still leaving the data controlling the sampling. So bottom-line is: if you want to fit complex models with little data at hand you might need to give some extra information to your algorithm and the Bayesian framework is exactly doing that (with some additional niceties that you’ll discover along the way).
Some references:
Two references that helped me a lot grasping the concept of convergence and the way the algorithm works:
- Ben Bolker ecological model book is a must read, Chapter 7 on optimization is particularly relevant for understanding convergence.
- Richard McElreath Statistical Rethinking book is great, it is way broader than the topic of this post, but reading the first few chapters will change the way you understand models.
Thank you