Week 8 - Tick Phenology & Prescribed Burning: Generalized Estimating Equations for Repeated-Measures Count Data

1 Week Overview

Session Content Duration
Session A Part 1 Student-led paper discussion (Gleim et al. 2014) ~30 min
Session A Part 2 Instructor-led GEE theory mini-lecture (slides) ~15 min
Session A Part 3 Group worksheet critique + R workshop + wrap-up ~30 min

Paper: Gleim, E. R., Conner, L. M., Berghaus, R. D., Levin, M. L., Zemtsova, G. E., & Yabsley, M. J. (2014). The Phenology of Ticks and the Effects of Long-Term Prescribed Burning on Tick Population Dynamics in Southwestern Georgia and Northwestern Florida. PLOS ONE, 9(11), e112174. https://doi.org/10.1371/journal.pone.0112174


TipLearning Objectives

By the end of class, students will be able to:

  1. Explain why GEE is appropriate for clustered/repeated-measures count data
  2. Write the GEE estimating equation and identify each component (\(\mathbf{D}_i\), \(\mathbf{V}_i\), \(\boldsymbol{\mu}_i\), \(R(\alpha)\))
  3. Compare exchangeable, AR(1), and independence working correlation structures and articulate when each is defensible
  4. Evaluate whether the paper’s analytic choices (negative binomial family, log link, exchangeable correlation) align with the data structure
  5. Implement a GEE in R using geepack::geeglm and interpret robust (sandwich) standard errors

2 Schedule

Entry Ticket — 0:00–0:05 (5 min)

Activity type: In-class formative pre-assessment

Before we begin today’s discussion, take 5 minutes to answer these questions about the paper you read. You can work in groups.

Field Question
Objective In one sentence, what is the paper’s primary research objective?
Research_Question State the main research question (what, in whom/what, compared to what, measured how).
Outcome_Variable What is the response/outcome variable? What type of data is it?
Predictors List the main predictor(s)/covariate(s).
Cluster_ID_and_time_structure What are the clusters (sampling units)? How many? How many repeated time points?
Analysis_method_guess What statistical method did the authors use (or what do you think they should have used)?
One_method_question Write one question about the statistical method that you want answered today.

5 minutes for discussion

Student Paper Presentation - 0:10–0:40 (30 min)

Presenter: Silas

2.1 ⚡ Interaction Activity: Challenge Questions

Procedure: 1. Each group (~2) presents their challenge question (30 sec each, ~3 groups max given time). 2. The presenter must respond to each challenge question directly (45–60 sec per response). 3. Instructor may supplement if the presenter’s answer is incomplete or if an important concept was missed.

GEE Theory Mini-Lecture — 0:40–0:55 (15 min)

Format: Instructor-led with slides (slides/slides-ticks-gee.qmd).

Slides covered: Slides 4–8 (Why GEE?, Estimating Equation, Sandwich Variance, Paper’s Model Spec, Working Correlation Comparison).

2.1.1 GEE Estimating Equation

The GEE for subject \(i\) solves:

\[ \sum_{i=1}^{K} \mathbf{D}_i^\top \mathbf{V}_i^{-1}\left(\mathbf{Y}_i - \boldsymbol{\mu}_i\right) = \mathbf{0} \]

where:

Symbol Definition
\(K\) Number of clusters (here: 21 plots)
\(\mathbf{Y}_i\) \(n_i \times 1\) vector of observed tick counts for plot \(i\)
\(\boldsymbol{\mu}_i\) \(n_i \times 1\) vector of marginal means, \(\mu_{it} = E[Y_{it} \mid \mathbf{X}_{it}]\)
\(\mathbf{D}_i\) \(n_i \times p\) derivative matrix, \(\mathbf{D}_i = \partial \boldsymbol{\mu}_i / \partial \boldsymbol{\beta}\)
\(\mathbf{V}_i\) Working covariance matrix, \(\mathbf{V}_i = \phi \, \mathbf{A}_i^{1/2} R(\alpha) \mathbf{A}_i^{1/2}\)
\(\mathbf{A}_i\) Diagonal matrix of marginal variances
\(R(\alpha)\) Working correlation matrix (exchangeable, AR(1), or independence)
\(\phi\) Overdispersion parameter

2.1.2 Sandwich (Robust) Variance

\[ \widehat{\text{Var}}(\hat{\boldsymbol{\beta}}) = \mathbf{B}^{-1} \mathbf{M} \mathbf{B}^{-1} \]

where \(\mathbf{B} = \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \mathbf{D}_i\) (bread) and \(\mathbf{M} = \sum_i \mathbf{D}_i^\top \mathbf{V}_i^{-1} \text{Cov}(\mathbf{Y}_i) \mathbf{V}_i^{-1} \mathbf{D}_i\) (meat).

2.1.3 Paper’s Inferred Model Specification

Note

The paper (Gleim et al. 2014) does not provide an explicit model equation. Based on the Methods section (p. 3), the authors state they used GEE with a negative binomial family and log link, exchangeable working correlation, to account for repeated measures within plots. A defensible inferred model is:

\[ \log\!\bigl(E[\text{ticks}_{it}]\bigr) = \beta_0 + \beta_1 \cdot \text{BurnRegime}_i + \beta_2 \cdot \text{HostAbundance}_{it} + \beta_3 \cdot \text{Canopy}_{it} + \beta_4 \cdot \text{Season}_{it} + \varepsilon_{it} \]

where \(i\) indexes plot (cluster) and \(t\) indexes monthly time point.

2.1.4 Working Correlation Structures

Structure \(R(\alpha)\) Assumption Appropriate when…
Independence \(I\) No correlation Cluster size is large; correlation is nuisance
Exchangeable \((1-\alpha)I + \alpha \mathbf{J}\) Equal pairwise correlation at all lags Correlation doesn’t decay with time
AR(1) \(\alpha^{|t-s|}\) Correlation decays with lag Time series within clusters; ecological seasonal data

Group Worksheet Critique — 0:55–1:05 (10 min)

Format: Small groups (2–3 students)

Instructions: 1. Each group gives 3 specific critiques of the paper as if you were a reviewer


Minute Paper — 1:12–1:15 (3 min)

Format: Exit ticket (template: activities/minute_paper_template.md)

Prompts:

  1. What is the single most important thing you learned about GEE today?
  2. What is one question you still have about applying GEE to your own research data?