Recently I’ve been struggling with incorporating auto-correlation into analyses. Auto-correlation can be accounted for with relative ease when your data are normally distributed or can be transformed to be normally distributed. However, if you’re anything like me, rarely are you so lucky. My data are generally proportional, presence/absence (binary), or count data. And in most instances my response of interest is over-dispersed, highly zero-inflated, and utterly impervious to transformations. In short, it’s ugly.
Enter Generalized Estimating Equations (GEEs). Before I delve into the wonders that are GEEs, a caveat – I’m an ecology graduate student trying to navigate the rapidly expanding world of statistics. I am by no means a statistician. As such I’m going to limit my discussion to the general strengths and weaknesses of GEEs. For a more thorough discussion including the statistical nitty-gritty, I would recommend Zuur et al. (2009), Hocking (2012), and references therein.
First introduced by Liang and Zeger (1986), GEEs are generalized linear models (GLMs) which incorporate a correlation structure. Simply said, this means GEEs can accommodate both auto-correlated and non-normal data. Finally, a way to deal with my ugly data.
Advantages of GEEs:
- Can model non-normal responses. As a GLM hybrid, GEEs model a distribution (Poisson, binomial, etc.) linearized with a link function.
- Users specify an association structure to describe the relationship between response variables. This can be used to accommodate longitudinal or spatial data, interacting individuals, or situations in which responses are related up to a threshold distance or time.
- GEEs have an inherent over-dispersion term
- Assumptions of homogeneity of variances are relaxed.
- Models provide population level estimates, making them computationally simpler than GLMMs.
- GEEs perform better than GLMMs when there are few observations of each of many subjects.
Limitations of GEEs:
- GEEs all use quasi-likelihood estimation, so maximum likelihood estimation (MLE) tools are not appropriate for testing fit, comparing models, and conducting inference about parameters. But there are other options (check out Dan Hocking’s blog on it)
- GEEs don’t give subject specific estimates.
- These models perform poorly when there are many observations from a handful of subjects.
GEES have many strengths, and seem ideally suited for dealing with ugly data, particularly when the response of a population is of greatest interest. Additionally, they can be implemented in most statistical programs, including R (geepack) and SAS. Despite this, GEEs have not gained traction amongst ecologists. Does anyone have thoughts as to why?
Hocking, D. J. 2012. The role of red-backed salamanders in ecosystems. Dissertation. University of New Hampshire.
Liang, K., and S. L. Zeger. 1986. Longitudinal data analysis using generalized linear models. Biometrika 73:13–22.
Zuur, A. F., E. N. Ieno, N. J. Walker, A. A. Saveliev, and G. M. Smith. 2009. Mixed effects models and extensions in ecology with R. Springer, New York.