How not to control for multiple testing

While reading the method section of a recent article by Solivares et al, I came upon the following paragraph: "The inclusion of many predictors in statistical models increases the chance of type I error (false positives). To account for this we used a Bernoulli process to detect false discovery rates, where the probability (P) of … Continue reading How not to control for multiple testing

Making a case for hierarchical generalized models

Science is also about convincing others, be it your supervisors or your collaborators, that what you intend to do is relevant to get the answers sought. This “what” can be anything from designing an experiment, to collecting samples or formatting the data. This post was inspired by a recent discussion with one of my supervisor … Continue reading Making a case for hierarchical generalized models

Crossed and Nested hierarchical models with STAN and R

Below I will expand on previous posts on bayesian regression modelling using STAN (see previous instalments here, here, and here). Topic of the day is modelling crossed and nested design in hierarchical models using STAN in R. Crossed design appear when we have more than one grouping variable and when data are recorded for each … Continue reading Crossed and Nested hierarchical models with STAN and R

Hierarchical models with RStan (Part 1)

Real-world data sometime show complex structure that call for the use of special models. When data are organized in more than one level, hierarchical models are the most relevant tool for data analysis. One classic example is when you record student performance from different schools, you might decide to record student-level variables (age, ethnicity, social … Continue reading Hierarchical models with RStan (Part 1)

Plotting regression curves with confidence intervals for LM, GLM and GLMM in R

[Updated 22nd January 2017, corrected mistakes for getting the fixed effect estimates of factor variables that need to be averaged out] Once models have been fitted and checked and re-checked comes the time to interpret them. The easiest way to do so is to plot the response variable versus the explanatory variables (I call them … Continue reading Plotting regression curves with confidence intervals for LM, GLM and GLMM in R

Count data: To Log or Not To Log

Count data are widely collected in ecology, for example when one count the number of birds or the number of flowers. These data follow naturally a Poisson or negative binomial distribution and are therefore sometime tricky to fit with standard LMs. A traditional approach has been to log-transform such data and then fit LMs to … Continue reading Count data: To Log or Not To Log

Correlation and Linear Regression in R

Before going into complex model building, looking at data relation is a sensible step to understand how your different variable interact together. Correlation look at trends shared between two variables, and regression look at causal relation between a predictor (independent variable) and a response (dependent) variable. Correlation: As mentionned above correlation look at global movement … Continue reading Correlation and Linear Regression in R