SNR 690 — Contemporary Approaches to Quantitative Methods – Tick Phenology & Prescribed Burning: GEE for Repeated-Measures Count Data

Initial question

“Why not just use a mixed model?”

GLMMs are usually the default for repeated measures, right?

Main difference between GLMM and GEE from interpretaion

Use GLMM when you want to make inference about a specific cluster (or individual) (e.g., plot, lake, animal) -> the conditional mean.
Use GEE when you want to make inference about the population-average effect across all clusters -> the marginal mean.

Marginal vs Conditional Inference/Mean

Marginal mean: \(E[Y]\) - the average response across the entire population of clusters (plots). This is what GEE estimates.
The marginal mean is the overall expected value of a variable by itself, averaging over (i.e., ignoring) other variables.
Interpretation: “What is the average value of Y in the whole population/sample?”
Example: If Y is income, the marginal mean is the average income across everyone, regardless of age, education, etc.

Marginal vs Conditional Inference/Mean

Conditional mean: \(E[Y \mid \mathbf{b}]\) - the expected response for a specific cluster (plot) given its random effect. This is what GLMM estimates.
The conditional mean is the expected value of a variable given specific values of other variables (e.g., random effects).
Notation: \(E[Y \mid \mathbf{b}]\) means the expected value of Y given the random effects b.
Interpretation: “What is the expected value of Y for a specific cluster (plot) given its unique characteristics (random effect)?”
Example: \(E[ Y \mid education = college]\) is the expected income for a specific individual i, given that they went to college.

Marginal means we are averaging over the random effects

It is a weighted average of the conditional means across all clusters, where the weights are determined by the distribution of the random effects. \[E[Y] = \int E[Y \mid \mathbf{b}] f(\mathbf{b}) d\mathbf{b}\] ::: {.notes} Expected Y equals the integral of expected Y given b times f(b) db

:::

“The overall mean of \(Y\) is the average of the conditional mean of \(Y\), averaged over how likely each \(\mathbf{b}\) value is.”

Why GEE? Clustered Data & Marginal Inference

The data structure problem:

21 plots, each measured 24 times (monthly, 2010–2011)
Tick counts within a plot are correlated (not independent)
Ordinary GLM assumes independence → inflated Type I error
GLMM models this correlation with random effects → estimates conditional mean
GEE models this correlation with a working covariance → estimates marginal mean

Two solutions:

Approach	What it models	Interpretation
GLMM	Conditional mean \(E[Y \mid \mathbf{b}]\)	Effect within a specific plot
GEE	Marginal mean \(E[Y]\)	Effect averaged across all plots

For policy-relevant questions - “Does burning reduce ticks on average?” - GEE is a better choice.

The GEE Estimating Equation

\[ \sum_{i=1}^{K} \mathbf{D}_i^\top \mathbf{V}_i^{-1}\left(\mathbf{Y}_i - \boldsymbol{\mu}_i\right) = \mathbf{0} \]

Symbol	Meaning
\(K = 21\)	Number of plots (clusters)
\(\mathbf{Y}_i\)	Observed tick counts for plot \(i\) (length 24)
\(\boldsymbol{\mu}_i = E[\mathbf{Y}_i \mid \mathbf{X}_i]\)	Predicted marginal means
\(\mathbf{D}_i = \partial \boldsymbol{\mu}_i / \partial \boldsymbol{\beta}\)	Derivative of mean w.r.t. parameters
\(\mathbf{V}_i = \phi\,\mathbf{A}_i^{1/2} R(\alpha) \mathbf{A}_i^{1/2}\)	Working covariance
\(R(\alpha)\)	Working correlation matrix

with respect to Let’s slow down here. This equation looks intimidating but the logic is simple. GEE is a score equation -> we’re finding the beta that sets the weighted sum of residuals to zero. The key quantity is V_i - the working covariance matrix. This is where we specify our assumption about how the 24 monthly tick counts within a plot are correlated. A_i is a diagonal matrix of variances (from the negative binomial family). R(alpha) is the working correlation - exchangeable, AR(1), or independence. And here is GEE’s great advantage: even if R(alpha) is wrong, the estimates are still consistent. We don’t have to get the correlation right to get correct point estimates. D -> How sensitive the expected counts are to changes in the model parameters

In your tick-count example

𝜇 𝑖 μ i

: expected tick counts for plot 𝑖 i across 24 observations

𝛽 β: effects of predictors (treatment, time, etc.)

👉 Then 𝐷 𝑖 D i

is:

a matrix of slopes showing how each expected count changes as each parameter changes

Robust standard errors with the sandwich estimator

Robust S.E. allow us to get good uncertainty estimates even if our working correlation is wrong. This is critical because we often don’t know the true correlation structure.
We usually assume the variance has a correct structure:
Normal: \(\text{Var}(Y_{it}) = \sigma^2\)
Poisson: \(\text{Var}(Y_{it}) = \lambda_{it}\)
Negative binomial: \(\text{Var}(Y_{it}) = \mu_{it} + \frac{\mu_{it}^2}{k}\)
What happens is those are violated???
What happens to our estimates
What happens to our standard errors? And decisions?

Sandwich (Robust) Variance

\[ \widehat{\text{Var}}(\hat{\boldsymbol{\beta}}) = \mathbf{B}^{-1} \mathbf{M} \mathbf{B}^{-1} \]

\(\mathbf{B} = \displaystyle\sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \mathbf{D}_i\) (the bread -> model-based information)
\(\mathbf{M} = \displaystyle\sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \widehat{\text{Cov}}(\mathbf{Y}_i) \mathbf{V}_i^{-1} \mathbf{D}_i\) (the meat -> empirical)

🥖 Bread (Model-Based)

Comes from the assumed model
Uses design matrix (X) and weights (W)
Represents: > “How precise estimates would be if the model were correct”

🥩 Meat (Data-Based)

Comes from observed residual variability
Uses cluster-level contributions ( _i )
Represents: > “How much variability we actually saw in the data”

🧠 Intuition

Bread = theory
Meat = reality
Sandwich = robust standard errors

Key insight: model gives the structure, the data gives the variability. The sandwich estimator combines both to give us robust standard errors.

When to use?

Not only based on the type of mean structure (marginal vs conditional) but also based on the number of clusters!!!
Generally, it is advised to have MANY CLUSTERS for the sandwich estimator to perform well. A common rule of thumb is at least 40 clusters. With fewer clusters, the sandwich estimator can be biased and produce anti-conservative confidence intervals (too narrow).

Warning

With only \(K = 21\) clusters, the sandwich estimator may be anti-conservative (too narrow CIs). Small-sample corrections (e.g., std.err = "jack" in geepack) are advisable.

The Paper’s Model Specification

Note

Note: Gleim et al. (2014) do not provide an explicit model equation. The following is an inferred defensible specification based on their Methods section (p. 3).

\[ \log\!\bigl(E[\text{ticks}_{it}]\bigr) = \beta_0 + \beta_1\,\text{BurnRegime}_i + \beta_2\,\text{HostAbundance}_{it} + \beta_3\,\text{Canopy}_{it} + \beta_4\,\text{Season}_{it} + \cdots \]

Family: Negative binomial (overdispersed counts; 47,185 total; 99% A. americanum)
Link: Log (counts → multiplicative effects on tick abundance)
Correlation: Exchangeable (equal correlation among all 24 monthly observations per plot)
Clusters: 21 plots (7.2–78.9 ha); ≥10 years of consistent burn management
Predictors: Burn regime (BB, BUB, UBB, UBUB), host abundance (cameras), canopy closure, tree density, ground cover, temperature, precipitation

Working Correlation Structures

Structure	Formula	Assumption	Best for…
Independence	\(R = I\)	Zero correlation	Sensitivity check; large \(K\)
Exchangeable	\(\text{Corr}(Y_{it}, Y_{is}) = \alpha\)	Equal correlation at all lags	Short series; random effects analogy
AR(1)	\(\text{Corr}(Y_{it}, Y_{is}) = \alpha^{\|t-s\|}\)	Correlation decays with lag	Time series; seasonal ecological data

For monthly tick counts over 2 years, AR(1) is more ecologically realistic than exchangeable - but GEE estimates are consistent under either.

Model selection: Use QIC (quasi-likelihood under independence model criterion) to compare working correlation structures.

How to run it?

In R, you can use the geepack package:

library(geepack)

fit_gee <- geeglm(
  y ~ x1 + x2 + time,
  id      = subject_id,          # cluster / repeated unit
  data    = dat,
  family  = binomial(link="logit"),
  corstr  = "ar1"                 # working AR(1)
)

summary(fit_gee)

Group Worksheet Critique - 0:55–1:05 (10 min)

Format: Small groups (2–3 students)

Instructions: 1. Each group gives 3 specific critiques of the paper as if you were a reviewer

Before You Leave

On an index card :

1. What is the single most important thing you learned about GEE today?

2. What is one question you still have about GEE or do you think you could apply GEE to your own research data?

Be specific. “I learned GEE exists” is not an acceptable answer for question 1.