S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion¶

Basic Information¶

Conference: ICLR 2026
arXiv: 2504.07667
Code: Project Page
Area: Computer Vision / Image Processing
Keywords: HDR Fusion, Synthetic Dataset, Domain Adaptation, Unreal Engine, Sim-to-Real

TL;DR¶

This paper proposes S2R-HDR, the first large-scale high-quality synthetic HDR fusion dataset (24,000 samples), and introduces S2R-Adapter, a domain adaptation method that bridges the synthetic-to-real gap, achieving state-of-the-art HDR fusion performance on real-world datasets.

Background & Motivation¶

State of the Field¶

HDR fusion is critical in computational photography, autonomous driving, and related fields. However, existing HDR datasets are extremely small in scale (the largest contains only 144 images) and are mostly confined to artificially controlled, simple dynamic scenes, failing to cover extreme conditions such as direct sunlight and large-scale motion.

Limitations of Prior Work¶

Extremely small scale: Kalantari (89 pairs), SCT (144 images), Challenge123 (123 images);

Limited dynamics: Most datasets contain only basic human motion, lacking diverse dynamic elements such as animals and vehicles;

Difficult data collection: Real HDR ground truth requires per-frame capture at different exposures with manually controlled motion, making collection time-consuming and hard to scale;

Limited dynamic range: Beam-splitter-based capture supports only two exposures, unable to cover scenes with extremely high dynamic range.

Mechanism¶

Leverage Unreal Engine 5 to render high-quality synthetic HDR data, combined with domain adaptation techniques to bridge the synthetic-to-real gap.

Method¶

1. S2R-HDR Dataset Construction¶

Rendering Pipeline Design¶

Modify UE5's default tone mapping and gamma correction to ensure outputs remain in linear HDR space
Store data in EXR floating-point format to avoid quantization artifacts
Simulate handheld camera shake to improve realism

Scene Diversity¶

Dynamic elements: Pedestrians, animals, vehicles, and other moving objects
Environment types: Indoor/outdoor, daytime/dusk/nighttime
Extreme lighting: Ultra-high dynamic range scenes including direct sunlight

Dataset Scale¶

1,000 sequences × 24 frames = 24,000 HDR images
Resolution 1920 × 1080, EXR format
166× larger than the previously largest dataset

2. S2R-Adapter Domain Adaptation¶

To bridge the synthetic-to-real domain gap, a plug-and-play dual-branch adapter is proposed:

Uses a low-rank adapter to retain shared knowledge from synthetic data and prevent catastrophic forgetting:

\[ f_s = U_s V_s x, \quad r_s \ll \min(h_{in}, h_{out}) \]

Transfer Branch¶

Uses a high-rank adapter to learn domain-specific knowledge and adapt to the real data distribution:

\[ f_t = U_t V_t x, \quad r_t \geq \max(h_{in}, h_{out}) \]

Final Output¶

The two branches are fused via weighted scaling factors:

\[ f = W_0 x + \alpha_s \times f_s + \alpha_t \times f_t \]

Test-Time Adaptation (TTA)¶

When labeled data is unavailable, scaling factors are dynamically adjusted based on model uncertainty:

\[ \alpha_s = 1 - \mathcal{U}(x); \quad \alpha_t = 1 + \mathcal{U}(x) \]

Higher uncertainty places greater reliance on the Transfer Branch; lower uncertainty preserves more shared knowledge.

3. Zero Inference Overhead¶

Through re-parameterization, no additional computational cost is incurred at inference time.

Experiments¶

Main Results: HDR Fusion on Real-World Datasets¶

Method	SCT PSNR-μ	SCT SSIM-μ	Challenge123 PSNR-μ	Challenge123 SSIM-μ
DHDRNet	40.05	0.9794	37.83	0.9707
AHDRNet	42.08	0.9837	40.44	0.9877
HDR-Transformer	42.39	0.9844	40.70	0.9881
SCTNet	42.55	0.9850	40.65	—
EHDRNet (S2R-HDR)	42.93	0.9858	42.15	0.9895
EHDRNet + S2R-Adapter	43.47	0.9871	41.89	0.9891

Ablation Study: Domain Adaptation Component Analysis¶

Configuration	SCT PSNR-μ	Challenge123 PSNR-μ
S2R-HDR training only	41.32	39.85
+ Share Branch	42.15	40.71
+ Transfer Branch	42.78	41.43
+ Share + Transfer (S2R-Adapter)	43.47	42.15
Direct fine-tuning on real data	42.55	40.65

Dataset Quality Comparison¶

Metric	Kalantari	SCT	Challenge123	S2R-HDR
FHLP ↑	15.07	12.43	26.91	28.02
EHL ↑	3.07	2.43	5.19	5.47
SI ↑	18.4	18.25	20.47	38.02
DR ↑	2.71	2.55	2.36	3.86
# Samples	89	144	123	24,000

Key Findings¶

Models trained on S2R-HDR significantly outperform those trained on small-scale real datasets, even in the presence of a domain gap;
S2R-Adapter effectively bridges the domain gap, yielding substantial improvements in both labeled and unlabeled adaptation settings;
The dual-branch design outperforms single-branch alternatives: the Share Branch and Transfer Branch each contribute approximately 1 dB PSNR improvement;
Direct fine-tuning is inferior to S2R-Adapter: fine-tuning directly on real data leads to overfitting and catastrophic forgetting;
TTA remains effective: even without ground-truth annotations, test-time adaptation improves performance by approximately 0.5 dB.

Highlights & Insights¶

First large-scale synthetic HDR fusion dataset, with 24,000 samples covering diverse scenes and extreme lighting conditions
Customized UE5 rendering pipeline preserves linear HDR space and simulates handheld camera shake
S2R-Adapter is plug-and-play and compatible with both CNN and Transformer architectures
Supports both labeled domain adaptation and unlabeled test-time adaptation modes
Zero additional inference overhead through re-parameterization

Limitations & Future Work¶

Synthetic data still exhibits a texture distribution gap relative to real data (visible in t-SNE visualizations)
Rendered scenes, though diverse, remain limited and may not cover all real-world edge cases
UE5 rendering requires substantial computational resources and artistic design effort
The domain adaptation method depends on the representativeness of the calibration dataset

HDR Datasets: Kalantari et al. (2017), SCT (Tel et al., 2023), Challenge123 (Kong et al., 2024)
HDR Fusion Methods: AHDRNet (Yan et al., 2019), HDR-Transformer (Liu et al., 2022), DiffHDR
Sim-to-Real Domain Adaptation: LoRA (Hu et al., 2021), TTA (Wang et al., 2022)
Synthetic Data: Li et al. (2023) for depth estimation, Yang et al. (2023) for semantic segmentation

Rating¶

Novelty: ⭐⭐⭐⭐ — First large-scale synthetic HDR dataset, filling a significant gap in the field
Technical Depth: ⭐⭐⭐⭐ — Well-designed rendering pipeline, dual-branch adapter, and TTA mechanism
Experimental Thoroughness: ⭐⭐⭐⭐ — Multi-benchmark, multi-architecture comparisons with comprehensive ablations
Value: ⭐⭐⭐⭐⭐ — Both the dataset and method are directly applicable to HDR research and product development