Scalable Vision-Guided Crop Yield Estimation¶

Conference: AAAI 2026 arXiv: 2511.12999 Code: https://github.com/medhanieirgau/scalable-vision-guided-crop-yield-estimation Area: Agricultural AI / Computer Vision Applications Keywords: crop yield estimation, prediction-powered inference, computer vision, uncertainty quantification, agricultural insurance

TL;DR¶

This paper proposes a crop yield estimation method based on Prediction-Powered Inference (PPI++), which leverages vision models trained on field photographs to supplement costly ground-truth crop cut measurements. The approach guarantees asymptotic unbiasedness while increasing effective sample size by up to 73%, enabling more accurate and cost-efficient regional yield estimation for agricultural insurance.

Background & Motivation¶

Background: Accurate regional average crop yield estimation is critical for agricultural monitoring and insurance decision-making. Current approaches primarily rely on field crop cuts, which are time-consuming and expensive.
Limitations of Prior Work: Field photographs and aerial imagery have been widely studied as cheaper alternatives for yield estimation, but their explanatory power is limited in complex smallholder farming environments (R² ≈ 0.5 only), and they may introduce bias, making them insufficient to directly replace ground measurements for insurance and reinsurance purposes.
Key Challenge: Photographs are cheap but insufficiently accurate, while crop cuts are accurate but expensive — the central challenge is how to use photographs to supplement crop cuts without introducing bias.
Goal: To improve estimation precision by incorporating additional photographic data while guaranteeing that regional average yield estimates remain asymptotically unbiased.
Key Insight: The paper adopts the Prediction-Powered Inference (PPI++) framework, using CV model predictions recalibrated through a "control function" as auxiliary information rather than as direct substitutes for ground measurements.
Core Idea: PPI++ employs a tuning coefficient \(\hat{\lambda}\) to adaptively balance photographic predictions and ground-truth measurements, guaranteeing no increase in variance regardless of CV model quality.

Method¶

Overall Architecture¶

The input consists of two sets of field data: labeled samples (containing crop cut ground truth \(Y_i\), photographs \(V_i\), and coordinates \(X_i\)) and unlabeled samples (photographs and coordinates only). A ResNet-50 is first used to predict yield from photographs as \(\hat{Y}_i = g(V_i)\), followed by learning a control function \(f(W_i)\) that maps predictions and coordinates to more accurate yield estimates. The regional average yield estimate is then computed via the PPI++ formula \(\hat{\theta}_{\text{PPI++}} = \hat{\theta}_{\text{lbl}} - \hat{\lambda}(\bar{f}_n - \bar{f}_N)\), with confidence intervals constructed using BCa bootstrap.

Key Designs¶

PPI++ Estimator
- Function: Combines a small set of labeled data with a large set of unlabeled photographic data to produce a regional average yield estimate that is asymptotically unbiased and does not increase variance.
- Mechanism: The PPI++ estimator is defined as \(\hat{\theta}_{\text{PPI++}} = \hat{\theta}_{\text{lbl}} - \hat{\lambda}(\frac{1}{n}\sum_{i=1}^n f(W_i) - \frac{1}{N}\sum_{i=n+1}^{n+N} f(W_i))\), where \(\hat{\lambda} = \frac{N}{n+N} \frac{\hat{\text{cov}}(Y, f(W))}{\hat{\text{Var}}(f(W))}\) adaptively minimizes asymptotic variance. When \(f\) approximates the true conditional mean \(\mu(w)\), this is equivalent to the semiparametrically efficient AIPW estimator.
- Design Motivation: Unlike fixing \(\lambda=1\) (original PPI) or \(\lambda=N/(n+N)\) (AIPW), PPI++ uses a data-driven \(\hat{\lambda}\) that adapts to the actual quality of the learned \(f\), yielding greater robustness in small-sample settings.
Cross-Regional Control Function Learning
- Function: Addresses the challenge that single-region sample sizes (approximately 20 fields) are too small to learn a robust control function.
- Mechanism: All regional data within a first-level administrative division (state/province) of a country are pooled, and cross-validated LASSO is used to learn \(f_r(\cdot) = \hat{\beta}_r^\top \psi(\cdot)\), where \(\psi(W) = (1, \hat{Y}, X)^\prime\) includes photographic model predictions and coordinates (with second-order interaction terms). Pooling may introduce asymptotic bias due to cross-regional heterogeneity, but substantially reduces finite-sample variance.
- Design Motivation: With only approximately 20 observations per region, nonparametric methods are infeasible. LASSO-regularized linear models offer a more favorable bias–variance tradeoff for small-sample settings. Experiments confirm that province-level pooling outperforms both national-level pooling and single-region learning.
BCa Bootstrap Confidence Intervals (PPBootBCa)
- Function: Constructs valid finite-sample confidence intervals for the PPI++ estimator.
- Mechanism: Bias-corrected and accelerated (BCa) bootstrap is applied with bias-correction parameter \(z_0\) and acceleration parameter \(\gamma\) (computed via jackknife) to adjust bootstrap quantiles. The procedure includes: (a) B=1000 bootstrap resamples to compute \(\hat{\theta}_{\text{PPI++}}^{(b)}\); (b) bias-correction parameter \(z_0 = \Phi^{-1}(B^{-1}\sum \mathbf{1}[\hat{\theta}^{(b)} \leq \hat{\theta}])\); and (c) jackknife-based acceleration parameter \(\gamma\).
- Design Motivation: Yield data are typically skewed and zero-inflated (especially for maize), causing standard normal asymptotic intervals to undercover in small samples. BCa bootstrap possesses second-order asymptotic properties that correct for skewness.

Loss & Training¶

The CV model fine-tunes ResNet-50 from ImageNet pretrained weights by minimizing MSE loss, using the Adam optimizer for 10 epochs. Five-fold cross-fitting is applied, and the primary evaluation metric is within-region R² rather than cross-region R².

Key Experimental Results¶

Main Results¶

Dataset: approximately 20,000 real crop cuts with field photographs (Nigeria rice; Zambia/Zimbabwe maize):

Country–Year	Crop	Regions	Fields	Within-Region R²	Cross-Region R²
Nigeria 2022	Rice	29	826	0.198	0.666
Zambia 2023	Maize	126	3,759	0.145	0.201
Zambia 2024	Maize	342	10,727	0.143	0.404
Zimbabwe 2024	Maize	87	4,173	0.261	0.448

Effective sample size gains (\(N/n=4\)):

Method	Rice (NG) Gain	Maize Gain
PPI++ (ppipp)	up to 73%	12–23%
AIPW	slightly lower	unstable
PPI (\(\lambda=1\))	sometimes increases variance	negative
nophoto (coordinates only)	moderate	moderate

Ablation Study¶

Configuration	Performance	Notes
Province-level pooling (recommended)	Best	Optimal bias–variance tradeoff
National-level pooling	Slightly worse	Excessive heterogeneity introduces bias
Single-region learning	Worst	Too few samples, unstable
LASSO	Best	Suited for small samples
Random forest	Worse	Higher overfitting risk
BCa bootstrap	Best coverage	Second-order asymptotic properties
CLT normal interval	Undercoverage	Fails under skewed data

Key Findings¶

Within-region R² is substantially lower than cross-region R² (0.14–0.26 vs. 0.20–0.67), indicating that photographic signals primarily capture between-region rather than within-region variation.
Even with within-region R² as low as 0.2, PPI++ yields significant effective sample size gains, since the asymptotic relative efficiency is approximately \((1 - R^2 \cdot N/(N+n))^{-1}\).
Improvements for rice substantially exceed those for maize (73% vs. 12–23%), potentially because rice field photographs exhibit more visually distinctive features.
Adaptive adjustment of \(\lambda\) is critical — fixing \(\lambda=1\) (original PPI) can sometimes increase variance.

Highlights & Insights¶

Statistically guaranteed AI-assisted decision-making: CV model predictions serve as auxiliary variables in statistical inference rather than direct replacements for ground measurements. No increase in variance is guaranteed regardless of model quality, providing a paradigm for deploying AI in high-stakes decisions (insurance, policy).
First large-scale application of the PPI framework in agriculture: The theoretical guarantees are validated on 584 regions with nearly 20,000 real field observations, confirming finite-sample effectiveness.
Adaptation of BCa bootstrap for PPI: Resolves insufficient confidence interval coverage under skewed, zero-inflated distributions.

Limitations & Future Work¶

No truly unlabeled data exist in the current datasets; the unlabeled setting is simulated via bootstrap, and real-world deployment performance remains to be verified.
The relatively low within-region R² constrains the upper bound of achievable gains; stronger CV models (e.g., using high-resolution UAV imagery or multi-temporal data) could further improve performance.
Only latitude and longitude are used as covariates; incorporating soil type, weather, and other features may yield additional improvements.
The method relies heavily on the i.i.d. assumption within datasets; systematic differences between labeled and unlabeled fields may exist in practice.

vs. traditional remote sensing yield estimation: Traditional methods directly substitute remote sensing for ground measurements, introducing bias; this paper uses remote sensing as an auxiliary rather than a substitute, preserving unbiasedness.
vs. PPI++ (the original statistical framework): This paper contributes innovations in cross-regional control function learning and BCa bootstrap adaptation.
The proposed framework is transferable to other "cheap proxy + expensive ground truth" estimation problems (e.g., medical imaging-assisted diagnosis, remote sensing-assisted population estimation).

Rating¶

Novelty: ⭐⭐⭐ Methodologically, the work is primarily an application of PPI++; innovations lie in control function learning and BCa bootstrap adaptation
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Validated at the scale of nearly 20,000 real observations with rigorous theoretical proofs
Writing Quality: ⭐⭐⭐⭐⭐ Clear structure with tight integration of theory and experiments
Value: ⭐⭐⭐⭐ Practical applicability to agricultural insurance in developing countries