Dual-Band Thermal Videography: Separating Time-Varying Reflection and Emission Near Ambient Conditions¶
Conference: CVPR 2026
arXiv: 2509.11334
Code: dual-band-thermal.github.io
Area: Others (Computational Imaging)
Keywords: thermal imaging, dual-band, reflection-emission separation, thermal radiation, emissivity estimation
TL;DR¶
This paper proposes a dual-band long-wave infrared (LWIR) video analysis framework that jointly leverages spectral cues (constant emissivity ratio across dual bands) and temporal cues (smooth object radiance variation vs. abrupt background radiance changes) to achieve, for the first time, pixel-wise separation of reflected and emitted components in dynamic scenes near ambient temperature, along with recovery of per-pixel emissivity and temperature fields.
Background & Motivation¶
Thermal cameras capture LWIR (8–14 µm) radiation comprising two components:
Emitted component: thermal radiation determined by the object's own temperature and emissivity
Reflected component: reflection of background radiation from the surrounding environment
Separating these two components is a long-standing challenge in thermal imaging. The core difficulties are:
- Under-constrained problem: even with multi-band measurements, the problem remains indeterminate without emissivity priors
- Near-ambient conditions: when object temperature is close to ambient, emission and reflection signals are of comparable magnitude, making separation more difficult
- Overly strong prior assumptions: industrial settings assume negligible background (objects far above ambient temperature); controlled environments assume spatiotemporally uniform background; the graybody assumption (constant emissivity across LWIR) is often violated
Two key insights in this paper break these limitations:
Spectral cue: Object emissivity may vary with wavelength within LWIR sub-bands (violating the graybody assumption), but the emissivity ratio between two bands \(k_1 = \epsilon_2/\epsilon_1\) remains a fixed constant.
Temporal cue: Object temperature evolves smoothly under thermal conduction, whereas background reflections can change abruptly due to human or object motion.
Method¶
Overall Architecture¶
Given dual-band thermal video sequences \(\{I_m^1, \ldots, I_m^N\}\) (\(m \in \{1,2\}\)), the method estimates three quantities: - Per-band emissivity \(\epsilon_m\) - Time-varying object temperature \(T_o(t)\) - Time-varying effective background temperature \(T_b(t)\)
Two operating modes are considered: calibrated and uncalibrated.
Key Designs¶
-
Thermal imaging model:
- Per-pixel radiance is governed by Kirchhoff's law: \(\epsilon(\lambda) + \tau(\lambda) + r(\lambda) = 1\)
- After gain/offset correction, pixel intensity is expressed as: \(I_m(t) = \epsilon_m U_m(T_o(t)) + (1-\epsilon_m) U_m(T_b(t))\)
- where \(U_m(T)\) follows the Sakuma–Hattori model, which can be linearly approximated near ambient temperature as \(U_m(T) = a_m T + b_m\)
-
Calibrated mode:
- Thermocouple measurements of surface temperature and a blackbody reference background with known temperature are used
- Emissivity is computed analytically from known quantities: \(\epsilon_m = \frac{I_m - U_m(T_b)}{U_m(T_o) - U_m(T_b)}\)
- In new scenes, \(T_o\) and \(T_b\) are recovered in closed form via a dual-band linear system
-
Uncalibrated mode (core contribution):
- Problem analysis: In the dual-band system \(\mathbf{I} = \mathbf{E}\mathbf{T}\), the matrix \(\mathbf{E}\) is 2×2 but nearly degenerate (Sakuma–Hattori functions are approximately linear near ambient temperature), requiring additional constraints
- Static background constraint: At pixels with constant background radiance, the ratio of dual-band temporal derivatives yields the constant \(k_1 = \epsilon_2/\epsilon_1\)
- Dynamic background constraint: The signal is decomposed into a smooth component \(\tilde{I}_m(t)\) (emission-dominated) and a residual (reflection). The ratio of dual-band residuals yields the constant \(k_2 = (1-\epsilon_2)/(1-\epsilon_1)\)
- Joint recovery: Emissivity is analytically recovered from \(k_1\) and \(k_2\): \(\epsilon_1 = \frac{k_2-1}{k_2-k_1}\), \(\epsilon_2 = k_1 \cdot \frac{k_2-1}{k_2-k_1}\)
-
Optimization framework: Four variable groups are jointly optimized: \(\tilde{I}_1(t)\) (smooth signal), \(\tilde{I}_2(0)\) (offset), \(\epsilon_m\) (emissivity), \(I_m^\varepsilon(t)\) (noise model)
- Smooth signal estimation: Using the \(k_1\) constraint, \(\tilde{I}_2(t)\) is constructed recursively as \(\tilde{I}_2(t) = \tilde{I}_2(t-1) + k_1 \frac{a_2}{a_1}(\tilde{I}_1(t) - \tilde{I}_1(t-1))\)
- Reconstruction constraint: Using \(k_2\), the reconstructed signal is \(\hat{I}_2(t) = \tilde{I}_1(t) + k_2 \frac{a_2}{a_1}(\ddot{I}_1(t) - \tilde{I}_1(t))\)
Loss & Training¶
This is a non-learning method based on classical optimization (no neural networks):
- \(\mathcal{L}_{smooth}\): second-order temporal smoothness prior \(\|\tilde{I}_m(t-1) - 2\tilde{I}_m(t) + \tilde{I}_m(t+1)\|^2\)
- \(\mathcal{L}_{Huber}\): normalized Huber loss (robust to abrupt background changes)
- \(\mathcal{L}_{MSE}\): MSE between reconstruction and observation
- \(\mathcal{L}_{noise}\): noise model regularization (L2 + zero-mean constraint)
Hardware setup: - Two FLIR Boson thermal cameras (≤40 mK / ≤20 mK NETD), 640×512 resolution - Spectral filters: 8.5/9.5/10.6/12.1 µm, mounted on FW103H/M motorized filter wheels - Ground truth: TC-08 data logger + Type-T thermocouples
Key Experimental Results¶
Main Results¶
Simulation experiments (temperature recovery error):
| Method | Description | Temperature Error |
|---|---|---|
| BCP | Blackbody Channel Prior | Large (fails for cold objects) |
| Two-wavelength pyrometry | Ignores background radiation | Degrades under high noise |
| Naive LS | Least squares (best of 5 random initializations) | Unstable |
| Ours | Dual-band + temporal constraints | Significantly outperforms all baselines under moderate noise |
Real-world temperature estimation:
| Scene | Ours (uncalib) | Ours (calib) | BCP | Naive | Notes |
|---|---|---|---|---|---|
| Wineglass | 1.72% | 5.04% | 14.6% | 31.68% | Peak 63.6°C |
| Coffee Pot | 5.34% | 0.36% | 6.62% | 45.5% | Peak 63.1°C |
Emissivity calibration comparison:
| Material | Reference | Mirror method | Ours |
|---|---|---|---|
| Chrome Ball | 0.10 | 0.43 | 0.12 |
| Al. Cup | 0.05 | 0.16 | 0.10 |
| Blue Paint | 0.87 | 0.88 | 0.88 |
| Glass Jar | 0.95 | 0.93 | 0.93 |
| Wineglass | 0.95 | 0.97 | 0.96 |
The proposed method substantially outperforms the FLIR-recommended reflector method for low-emissivity materials.
Ablation Study¶
| Loss term | Temperature error increase upon removal | Notes |
|---|---|---|
| Reconstruction term \(\mathcal{L}_{MSE}\) | 90.10% | Most critical |
| Smoothness term \(\mathcal{L}_{smooth}\) | 56.73% | Temporal prior is important |
| Huber loss | 11.31% | Robustness to abrupt changes |
| Noise term | 11.64% | More useful under high noise |
Key Findings¶
- The uncalibrated method achieves only 1.72% temperature error on the wineglass scene, approaching practical utility
- Low-emissivity materials (e.g., chrome ball, aluminum cup) show the largest advantage — existing methods fail substantially
- Spectral filter selection: the 9.5 µm filter yields the highest condition number and is best suited for pairing with the full band
- Visually invisible thermal signals successfully separated: e.g., fingerprint heat dissipation on glass vs. finger reflections, slow coffee pot cooling vs. moving human body reflections
Highlights & Insights¶
- Elegant physics-driven modeling: rather than relying on neural networks, the method derives constraints from thermal radiation physics, leveraging dual-band and temporal priors to regularize the under-constrained problem
- Dual calibrated/uncalibrated modes: the calibrated mode achieves higher accuracy at the cost of setup overhead; the uncalibrated mode is fully automatic with slightly lower accuracy, covering diverse use cases
- First separation in dynamic near-ambient scenes: no prior method could operate under conditions where reflected and emitted signals are of comparable magnitude
- Reveals rich information in thermal imagery: scene information is uncovered in dimensions entirely imperceptible to the human eye and visible-light cameras
- Explicit noise modeling: spectral filtering reduces SNR; per-pixel per-frame noise modeling improves robustness
Limitations & Future Work¶
- Assumption of uncorrelated background and object temperature variations: the assumption breaks down when the entire room heats uniformly
- Sensitivity limitations of low-cost microbolometers with filters: small temperature differences on low-emissivity objects are difficult to detect
- Requires two cameras: a beamsplitter was not used (due to SNR loss), but dual cameras introduce parallax
- LWIR only: extension to MWIR or additional bands may further improve accuracy
- Non-real-time: computational efficiency of the optimization framework is not discussed
Related Work & Insights¶
- BCP (Blackbody Channel Prior): a recent reflection removal method from the CV community, but assumes the locally brightest pixel approximates a blackbody — fails for cold objects
- Two-wavelength pyrometry: a classical industrial method assuming graybody objects and neglecting background radiation
- Shape from Heat Conduction (ECCV 2024): recovers shape from thermal conduction; this paper complements it by handling the reflected component
- Inspiration: reflection/emission separation in thermal imaging is analogous to diffuse/specular separation in visible-light imaging, but additionally involves thermal conduction — a cross-domain problem at the intersection of light transport and heat transfer
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — The combination of dual-band and temporal constraints is a fundamentally new approach that elegantly resolves a long-standing under-constrained problem
- Experimental Thoroughness: ⭐⭐⭐⭐ — Simulation + real experiments + calibrated/uncalibrated comparisons + ablation; real-world scenes are somewhat limited
- Writing Quality: ⭐⭐⭐⭐⭐ — Physical derivations are rigorous and clear, flowing seamlessly from imaging model to constraint derivation to optimization
- Value: ⭐⭐⭐⭐ — Opens new analytical dimensions in thermal imaging with implications for computational thermal imaging, non-contact thermometry, and thermal NLOS sensing