Skip to content

Conflated Inverse Modeling for Urban Vegetation Patterns

Conference: CVPR 2026
arXiv: 2604.13028
Code: Available
Area: Remote Sensing / Urban Computing
Keywords: Inverse Modeling, Diffusion Model, Urban Vegetation, Land Surface Temperature, NDVI

TL;DR

A framework conflating a forward prediction model with a diffusion-based inverse generative model to produce diverse yet physically plausible urban vegetation spatial configurations (NDVI patterns) under specified temperature change targets, achieving 3.4× diversity improvement while reducing temperature control error by 37%.

Background & Motivation

State of the field: Urban areas are increasingly affected by heat extremes, and vegetation regulates urban microclimates through shading and evapotranspiration. Forward models (predicting temperature given vegetation) are mature, but the inverse problem (determining vegetation configurations given temperature targets) remains largely unexplored.

Existing limitations: The inverse problem is inherently ill-posed—multiple spatial vegetation arrangements can produce similar aggregated temperature responses. Traditional regression and deterministic neural networks cannot capture this ambiguity and tend to produce averaged solutions. Data scarcity exacerbates the issue: observations of the same urban area under different vegetation scenarios are unavailable.

Core tension: The approach must simultaneously achieve diversity (varied spatial vegetation configurations) and specificity (all satisfying the specified temperature target), yet these combinations may not exist in the training data.

Objective: To learn a conditional generative model that produces diverse NDVI patterns satisfying regional temperature targets under building height constraints.

Approach: Vegetation-driven temperature regulation is modeled as a generative inverse problem, with temperature constraints enforced at the aggregated regional scale to preserve spatial diversity.

Core idea: A forward model supervises the inverse diffusion model at the regional scale for temperature consistency, rather than pixel-level constraints, thereby generating diverse vegetation spatial configurations while guaranteeing temperature targets.

Method

Overall Architecture

The framework comprises three components: (1) a U-Net forward model that predicts land surface temperature change from NDVI and building height; (2) a diffusion inverse model that generates NDVI conditioned on building height and coarsened temperature maps; (3) conflated training where the forward model's regional mean temperature prediction provides additional supervision during inverse model training.

Key Designs

  1. Forward-Inverse Conflated Training:

    • Function: Simultaneously achieving diversity and temperature specificity
    • Core idea: The inverse diffusion model generates NDVI from coarsened temperature conditions and building height. During training, the generated NDVI is fed into a frozen forward model to predict the regional mean temperature, and the discrepancy with the ground-truth temperature serves as an additional loss. Temperature constraints are imposed at the aggregated regional scale rather than the pixel scale
    • Design motivation: Pixel-level constraints over-constrain the generation and eliminate diversity; regional-level constraints preserve spatial degrees of freedom
  2. Coarsened Temperature Conditioning:

    • Function: Preventing fine-grained temperature maps from over-determining the generation
    • Core idea: The temperature condition is spatially coarsened (downsampled) so that the model only needs to satisfy regional temperature trends rather than the exact spatial temperature distribution
    • Design motivation: Directly conditioning on fine temperature maps creates a one-to-one mapping, eliminating diversity
  3. EDM Diffusion Framework:

    • Function: Robust conditional generation
    • Core idea: Based on EDM (Elucidating Diffusion Models) preconditioning and sampling design, with conditional input as a 2-channel stack of building height and coarsened temperature
    • Design motivation: EDM offers superior training efficiency and robustness compared to standard DDPM

Loss Function / Training Strategy

Diffusion denoising loss + forward model temperature consistency loss (regional mean MSE). The forward model is trained independently first, then frozen during inverse model training. Training is conducted on Landsat 8 satellite imagery from 20 U.S. cities.

Key Experimental Results

Main Results

Method Diversity (FID diversity)↑ Temperature Error (RMSE)↓ Temperature Control Rate↑
Deterministic Regression 1.0× 2.85°C 62%
cGAN 2.1× 2.15°C 71%
Standard Diffusion 2.8× 2.42°C 68%
Proposed (Conflated) 3.4× 1.79°C 85%

Ablation Studies

Configuration Diversity Temp. Error Description
Full model 3.4× 1.79°C Conflated training + coarsened conditioning
No forward model supervision 3.8× 2.85°C Diverse but temperature uncontrolled
Fine temperature conditioning 1.2× 1.52°C Accurate temperature but no diversity
No coarsening 1.8× 1.95°C Moderate performance

Key Findings

  • Coarsened temperature conditioning is the critical design for balancing diversity and specificity
  • Forward model supervision reduces temperature error from 2.85°C to 1.79°C (37% reduction) while maintaining 3.4× diversity
  • The model can generate NDVI-temperature combinations absent from the training data

Highlights & Insights

  • Modeling the inverse problem as conditional generation rather than optimization is the correct abstraction: it acknowledges the one-to-many nature of the problem
  • The trade-off between regional-level vs. pixel-level constraints is a profound insight generalizable to other generation tasks requiring both diversity and constraint satisfaction
  • Validation across 20 cities spanning different climate zones demonstrates strong generalizability

Limitations & Future Work

  • Only NDVI and building height are considered, neglecting water bodies, roads, and other land cover types
  • Satellite image resolution (30m Landsat) limits fine-scale vegetation planning
  • Only temperature change is validated; the actual impact on human thermal exposure is not evaluated
  • Extension to multi-objective optimization (temperature + carbon sequestration + biodiversity) is a promising direction
  • vs. Traditional urban planning optimization: Optimization methods produce a single deterministic solution, whereas this work generates multiple feasible plans for planners to choose from
  • vs. DiffusionSat: DiffusionSat generates satellite imagery, while this work uses diffusion models to generate vegetation configurations satisfying physical constraints

Rating

  • Novelty: ⭐⭐⭐⭐ Both the forward-inverse conflated framework and regional-level constraints are novel
  • Experimental rigor: ⭐⭐⭐⭐ 20 cities + detailed ablations
  • Writing quality: ⭐⭐⭐⭐ Clear problem formalization
  • Impact: ⭐⭐⭐⭐ Practical value for urban climate adaptation planning