aov(yield ~ ST * Genotype * Year + Error(Block/ST), data = dat)Week 05 — Performance and Potentiality of Camelina (Camelina sativa L. Crantz) Genotypes in Response to Sowing Date under Mediterranean Environment
1 Week Overview
| Session | Content |
|---|---|
| Session A Part 1 | Student-led paper discussion (~35 min) |
| Session A Part 2 | Method deep-dive (this document, ~35 min) |
| Session B | Hands-on coding workshop |
- Understand why split-plot designs exist and how they differ from RCBD
- Recognize that
lmer + anova()is a linear mixed model, not classical ANOVA - Identify whole-plot vs subplot error strata and why they change the F-ratio
- Interpret G × ST interactions biologically and agronomically
- Apply the Zuur data exploration checklist critically to a published paper
1.1 Background & Readings
Paper:
Angelini, L. G., Chehade, L. A., Foschi, L., & Tavarini, S. (2020). Performance and Potentiality of Camelina (Camelina sativa L. Crantz) Genotypes in Response to Sowing Date under Mediterranean Environment. Agronomy, 10(12), 1929. https://doi.org/10.3390/agronomy10121929
While reading, always think about:
- What is the research question**? What factors are being tested?
- What is the experimental design**? (How many factors? How are treatments arranged? What is the blocking factor?)
- What are the response variables**? What type of data are they? (continuous, counts, proportions?)
- What statistical methods are used? Are they appropriate given the design?
- Does the Method-Question-Data Triangle align in this paper?
- What would Zuur (Week 1) say about their data exploration? Do you see evidence of it?
1.2 Response variables
| Variable | Type |
|---|---|
| Seed yield (Mg/ha) | Continuous |
| Oil content (% DW) | Continuous |
| 1000-seed weight (g) | Continuous |
| Plant height (cm) | Continuous |
| Siliques per plant | Count |
| Plant density (No./m²) | Count |
| Harvest index | Index |
Counts were analyzed with Poisson GLMs: potentially the right call (if they checked for zero-inflation). We return to this below.
2 The Experimental Design
2.1 Why It Is a Split-Plot
You cannot randomize Autumn vs. Spring sowing within the same small plot — the whole field area for a given replicate must be sown in the same season. Sowing time requires a large experimental unit. This is the defining reason split-plot designs exist.
BLOCK (replicate, n = 4)
├── WHOLE PLOT: Autumn sowing
│ ├── subplot: V1, V2, V3, V4, V5, V6, CELINE (randomized)
└── WHOLE PLOT: Spring sowing
└── subplot: V1, V2, V3, V4, V5, V6, CELINE (randomized)
2.2 Two Error Strata
| Stratum | Tests | Experimental unit |
|---|---|---|
| Whole-plot error | Sowing Time (ST) | Whole plot within block |
| Subplot error | Genotype (G), G×ST | Subplot within whole plot |
Using a single pooled error (naive ANOVA) gives an anti-conservative test for ST (false positives) and an over-conservative test for G (false negatives).
2.3 How to run as an ANOVA (not what they did, even though they claim that)
2.4 What They Likely Ran
model <- lmer(yield ~ Genotype * ST * Year + (1|Block) + (1|Block:ST),
data = dat)
anova(model) # Satterthwaite F-testsThis is a linear mixed model. The anova() call uses Satterthwaite approximated denominator df - which may be non-integer. Reporting this as “ANOVA” obscures what was done.
2.5 What a Significant G × ST Interaction Means
The ranking of genotypes depends on sowing time. You cannot make one variety recommendation.
2.6 Ordinal vs. Disordinal
| Type | Definition | Implication |
|---|---|---|
| Ordinal | Rankings preserved, magnitudes differ | Main effects broadly valid |
| Disordinal | Rankings cross over | Separate recommendations essential |
2.7 Tukey vs. LSD
| LSD | Tukey HSD | |
|---|---|---|
| Controls | Per-comparison alpha | Familywise error rate |
| Result | More liberal | More conservative |
| Used in paper | ✅ |
Compact letter displays (“a”, “ab”, “b”) can mask near-significant differences. Always look at the actual means and CIs.
2.8 The Method–Question–Data Triangle
QUESTION
"Best genotype ×
sowing time?"
/\
/ \
DATA / \ METHOD
-----/------\-------
Factorial Split-plot LMM
blocked Two error strata
cont+count Poisson for counts
Alignment? Mostly yes — but the gap between what was done (LMM) and what was reported (“ANOVA”) is a transparency issue.
2.9 The 6 Key Takeaways
- Split-plot because ST cannot be randomized at subplot level → two error strata
lmer + anova()is a linear mixed model, not classical ANOVA, this is a reporting issue- Year as fixed is pragmatic with n=2 but limits generalizability
- G×ST is disordinal separate recommendations required
- Poisson for counts was correct if they checked for zero-inflation (not reported)
- Data exploration is underreported it usually is
2.10 Data analysis
You can generate the data using this, and then analyze it using either the aov() or lmer() approach. Check for differences
# I will simulate the data. 4 repliacation (Block), 2 sowing times (ST), 7 Genotypes (G), 2 years (Year)
set.seed(42)
dat<-expand.grid(Block = factor(1:4),
ST = factor(c("Autumn", "Spring")),
Genotype = factor(paste0("V", 1:6)),
Year = factor(2018:2019))
dat$yield <- 1.8 +
0.30 * (dat$ST == "Autumn") +
c(0.10, -0.10, 0.40, -0.20, 0.05, 0.10, 0.20)[as.integer(dat$G)] +
0.40 * (dat$ST == "Autumn" & dat$G == "V3") +
-0.05 * (dat$ST == "Spring" & dat$G == "V5") +
0.20 * (dat$Year == "2019") +
rnorm(nrow(dat), 0, 0.15) # add some noise