Conflated Inverse Modeling for Urban Vegetation Patterns¶
Conference: CVPR 2026
arXiv: 2604.13028
Code: Available
Area: Remote Sensing / Urban Computing
Keywords: Inverse Modeling, Diffusion Model, Urban Vegetation, Land Surface Temperature, NDVI
TL;DR¶
A framework conflating a forward prediction model with a diffusion-based inverse generative model to produce diverse yet physically plausible urban vegetation spatial configurations (NDVI patterns) under specified temperature change targets, achieving 3.4× diversity improvement while reducing temperature control error by 37%.
Background & Motivation¶
State of the field: Urban areas are increasingly affected by heat extremes, and vegetation regulates urban microclimates through shading and evapotranspiration. Forward models (predicting temperature given vegetation) are mature, but the inverse problem (determining vegetation configurations given temperature targets) remains largely unexplored.
Existing limitations: The inverse problem is inherently ill-posed—multiple spatial vegetation arrangements can produce similar aggregated temperature responses. Traditional regression and deterministic neural networks cannot capture this ambiguity and tend to produce averaged solutions. Data scarcity exacerbates the issue: observations of the same urban area under different vegetation scenarios are unavailable.
Core tension: The approach must simultaneously achieve diversity (varied spatial vegetation configurations) and specificity (all satisfying the specified temperature target), yet these combinations may not exist in the training data.
Objective: To learn a conditional generative model that produces diverse NDVI patterns satisfying regional temperature targets under building height constraints.
Approach: Vegetation-driven temperature regulation is modeled as a generative inverse problem, with temperature constraints enforced at the aggregated regional scale to preserve spatial diversity.
Core idea: A forward model supervises the inverse diffusion model at the regional scale for temperature consistency, rather than pixel-level constraints, thereby generating diverse vegetation spatial configurations while guaranteeing temperature targets.
Method¶
Overall Architecture¶
The framework comprises three components: (1) a U-Net forward model that predicts land surface temperature change from NDVI and building height; (2) a diffusion inverse model that generates NDVI conditioned on building height and coarsened temperature maps; (3) conflated training where the forward model's regional mean temperature prediction provides additional supervision during inverse model training.
Key Designs¶
-
Forward-Inverse Conflated Training:
- Function: Simultaneously achieving diversity and temperature specificity
- Core idea: The inverse diffusion model generates NDVI from coarsened temperature conditions and building height. During training, the generated NDVI is fed into a frozen forward model to predict the regional mean temperature, and the discrepancy with the ground-truth temperature serves as an additional loss. Temperature constraints are imposed at the aggregated regional scale rather than the pixel scale
- Design motivation: Pixel-level constraints over-constrain the generation and eliminate diversity; regional-level constraints preserve spatial degrees of freedom
-
Coarsened Temperature Conditioning:
- Function: Preventing fine-grained temperature maps from over-determining the generation
- Core idea: The temperature condition is spatially coarsened (downsampled) so that the model only needs to satisfy regional temperature trends rather than the exact spatial temperature distribution
- Design motivation: Directly conditioning on fine temperature maps creates a one-to-one mapping, eliminating diversity
-
EDM Diffusion Framework:
- Function: Robust conditional generation
- Core idea: Based on EDM (Elucidating Diffusion Models) preconditioning and sampling design, with conditional input as a 2-channel stack of building height and coarsened temperature
- Design motivation: EDM offers superior training efficiency and robustness compared to standard DDPM
Loss Function / Training Strategy¶
Diffusion denoising loss + forward model temperature consistency loss (regional mean MSE). The forward model is trained independently first, then frozen during inverse model training. Training is conducted on Landsat 8 satellite imagery from 20 U.S. cities.
Key Experimental Results¶
Main Results¶
| Method | Diversity (FID diversity)↑ | Temperature Error (RMSE)↓ | Temperature Control Rate↑ |
|---|---|---|---|
| Deterministic Regression | 1.0× | 2.85°C | 62% |
| cGAN | 2.1× | 2.15°C | 71% |
| Standard Diffusion | 2.8× | 2.42°C | 68% |
| Proposed (Conflated) | 3.4× | 1.79°C | 85% |
Ablation Studies¶
| Configuration | Diversity | Temp. Error | Description |
|---|---|---|---|
| Full model | 3.4× | 1.79°C | Conflated training + coarsened conditioning |
| No forward model supervision | 3.8× | 2.85°C | Diverse but temperature uncontrolled |
| Fine temperature conditioning | 1.2× | 1.52°C | Accurate temperature but no diversity |
| No coarsening | 1.8× | 1.95°C | Moderate performance |
Key Findings¶
- Coarsened temperature conditioning is the critical design for balancing diversity and specificity
- Forward model supervision reduces temperature error from 2.85°C to 1.79°C (37% reduction) while maintaining 3.4× diversity
- The model can generate NDVI-temperature combinations absent from the training data
Highlights & Insights¶
- Modeling the inverse problem as conditional generation rather than optimization is the correct abstraction: it acknowledges the one-to-many nature of the problem
- The trade-off between regional-level vs. pixel-level constraints is a profound insight generalizable to other generation tasks requiring both diversity and constraint satisfaction
- Validation across 20 cities spanning different climate zones demonstrates strong generalizability
Limitations & Future Work¶
- Only NDVI and building height are considered, neglecting water bodies, roads, and other land cover types
- Satellite image resolution (30m Landsat) limits fine-scale vegetation planning
- Only temperature change is validated; the actual impact on human thermal exposure is not evaluated
- Extension to multi-objective optimization (temperature + carbon sequestration + biodiversity) is a promising direction
Related Work & Inspiration¶
- vs. Traditional urban planning optimization: Optimization methods produce a single deterministic solution, whereas this work generates multiple feasible plans for planners to choose from
- vs. DiffusionSat: DiffusionSat generates satellite imagery, while this work uses diffusion models to generate vegetation configurations satisfying physical constraints
Rating¶
- Novelty: ⭐⭐⭐⭐ Both the forward-inverse conflated framework and regional-level constraints are novel
- Experimental rigor: ⭐⭐⭐⭐ 20 cities + detailed ablations
- Writing quality: ⭐⭐⭐⭐ Clear problem formalization
- Impact: ⭐⭐⭐⭐ Practical value for urban climate adaptation planning