Week 8 - Tick Phenology & Prescribed Burning: Generalized Estimating Equations for Repeated-Measures Count Data
1 Week Overview
| Session | Content | Duration |
|---|---|---|
| Session A Part 1 | Student-led paper discussion (Gleim et al. 2014) | ~30 min |
| Session A Part 2 | Instructor-led GEE theory mini-lecture (slides) | ~15 min |
| Session A Part 3 | Group worksheet critique + R workshop + wrap-up | ~30 min |
Paper: Gleim, E. R., Conner, L. M., Berghaus, R. D., Levin, M. L., Zemtsova, G. E., & Yabsley, M. J. (2014). The Phenology of Ticks and the Effects of Long-Term Prescribed Burning on Tick Population Dynamics in Southwestern Georgia and Northwestern Florida. PLOS ONE, 9(11), e112174. https://doi.org/10.1371/journal.pone.0112174
By the end of class, students will be able to:
- Explain why GEE is appropriate for clustered/repeated-measures count data
- Write the GEE estimating equation and identify each component (\(\mathbf{D}_i\), \(\mathbf{V}_i\), \(\boldsymbol{\mu}_i\), \(R(\alpha)\))
- Compare exchangeable, AR(1), and independence working correlation structures and articulate when each is defensible
- Evaluate whether the paper’s analytic choices (negative binomial family, log link, exchangeable correlation) align with the data structure
- Implement a GEE in R using
geepack::geeglmand interpret robust (sandwich) standard errors
2 Schedule
Entry Ticket — 0:00–0:05 (5 min)
Activity type: In-class formative pre-assessment
Before we begin today’s discussion, take 5 minutes to answer these questions about the paper you read. You can work in groups.
| Field | Question |
|---|---|
Objective |
In one sentence, what is the paper’s primary research objective? |
Research_Question |
State the main research question (what, in whom/what, compared to what, measured how). |
Outcome_Variable |
What is the response/outcome variable? What type of data is it? |
Predictors |
List the main predictor(s)/covariate(s). |
Cluster_ID_and_time_structure |
What are the clusters (sampling units)? How many? How many repeated time points? |
Analysis_method_guess |
What statistical method did the authors use (or what do you think they should have used)? |
One_method_question |
Write one question about the statistical method that you want answered today. |
5 minutes for discussion
Student Paper Presentation - 0:10–0:40 (30 min)
Presenter: Silas
2.1 ⚡ Interaction Activity: Challenge Questions
Procedure: 1. Each group (~2) presents their challenge question (30 sec each, ~3 groups max given time). 2. The presenter must respond to each challenge question directly (45–60 sec per response). 3. Instructor may supplement if the presenter’s answer is incomplete or if an important concept was missed.
GEE Theory Mini-Lecture — 0:40–0:55 (15 min)
Format: Instructor-led with slides (slides/slides-ticks-gee.qmd).
Slides covered: Slides 4–8 (Why GEE?, Estimating Equation, Sandwich Variance, Paper’s Model Spec, Working Correlation Comparison).
2.1.1 GEE Estimating Equation
The GEE for subject \(i\) solves:
\[ \sum_{i=1}^{K} \mathbf{D}_i^\top \mathbf{V}_i^{-1}\left(\mathbf{Y}_i - \boldsymbol{\mu}_i\right) = \mathbf{0} \]
where:
| Symbol | Definition |
|---|---|
| \(K\) | Number of clusters (here: 21 plots) |
| \(\mathbf{Y}_i\) | \(n_i \times 1\) vector of observed tick counts for plot \(i\) |
| \(\boldsymbol{\mu}_i\) | \(n_i \times 1\) vector of marginal means, \(\mu_{it} = E[Y_{it} \mid \mathbf{X}_{it}]\) |
| \(\mathbf{D}_i\) | \(n_i \times p\) derivative matrix, \(\mathbf{D}_i = \partial \boldsymbol{\mu}_i / \partial \boldsymbol{\beta}\) |
| \(\mathbf{V}_i\) | Working covariance matrix, \(\mathbf{V}_i = \phi \, \mathbf{A}_i^{1/2} R(\alpha) \mathbf{A}_i^{1/2}\) |
| \(\mathbf{A}_i\) | Diagonal matrix of marginal variances |
| \(R(\alpha)\) | Working correlation matrix (exchangeable, AR(1), or independence) |
| \(\phi\) | Overdispersion parameter |
2.1.2 Sandwich (Robust) Variance
\[ \widehat{\text{Var}}(\hat{\boldsymbol{\beta}}) = \mathbf{B}^{-1} \mathbf{M} \mathbf{B}^{-1} \]
where \(\mathbf{B} = \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \mathbf{D}_i\) (bread) and \(\mathbf{M} = \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \text{Cov}(\mathbf{Y}_i) \mathbf{V}_i^{-1} \mathbf{D}_i\) (meat).
2.1.3 Paper’s Inferred Model Specification
The paper (Gleim et al. 2014) does not provide an explicit model equation. Based on the Methods section (p. 3), the authors state they used GEE with a negative binomial family and log link, exchangeable working correlation, to account for repeated measures within plots. A defensible inferred model is:
\[ \log\!\bigl(E[\text{ticks}_{it}]\bigr) = \beta_0 + \beta_1 \cdot \text{BurnRegime}_i + \beta_2 \cdot \text{HostAbundance}_{it} + \beta_3 \cdot \text{Canopy}_{it} + \beta_4 \cdot \text{Season}_{it} + \varepsilon_{it} \]
where \(i\) indexes plot (cluster) and \(t\) indexes monthly time point.
2.1.4 Working Correlation Structures
| Structure | \(R(\alpha)\) | Assumption | Appropriate when… |
|---|---|---|---|
| Independence | \(I\) | No correlation | Cluster size is large; correlation is nuisance |
| Exchangeable | \((1-\alpha)I + \alpha \mathbf{J}\) | Equal pairwise correlation at all lags | Correlation doesn’t decay with time |
| AR(1) | \(\alpha^{|t-s|}\) | Correlation decays with lag | Time series within clusters; ecological seasonal data |
Group Worksheet Critique — 0:55–1:05 (10 min)
Format: Small groups (2–3 students)
Instructions: 1. Each group gives 3 specific critiques of the paper as if you were a reviewer
Minute Paper — 1:12–1:15 (3 min)
Format: Exit ticket (template: activities/minute_paper_template.md)
Prompts:
- What is the single most important thing you learned about GEE today?
- What is one question you still have about applying GEE to your own research data?