High Dynamic Range Novel View Synthesis with Single Exposure¶

Conference: ICML 2025
arXiv: 2505.01212
Code: github.com/prinasi/Mono-HDR-3D
Area: 3D Vision
Keywords: HDR Novel View Synthesis, Single Exposure, Camera Imaging Modeling, NeRF, 3D Gaussian Splatting

TL;DR¶

First proposes the problem setting of HDR novel view synthesis (HDR-NVS) using only single-exposure LDR images, and designs Mono-HDR-3D, a meta-algorithm framework based on camera imaging principles. It achieves HDR scene modeling without HDR supervision through an LDR-to-HDR Color Converter (L2H-CC) and an HDR-to-LDR closed-loop Color Converter (H2L-CC).

Background & Motivation¶

The goal of HDR novel view synthesis is to establish a 3D HDR scene model from LDR images and generate HDR rendered images from arbitrary viewpoints. Existing methods (HDR-NeRF, HDR-GS) rely on multi-exposure LDR images as training data, which suffers from the following inherent limitations:

Motion artifacts: Long-exposure frames accumulate blur due to object/camera motion, and displacement between different exposures generates ghosting.

Alignment difficulties: Different exposure times lead to differences in luminance distribution and local contrast, increasing registration difficulty.

High acquisition cost: Requires specialized equipment and multiple shots, making it hard to implement in dynamic environments or on mobile devices.

The authors propose a more practical and challenging new task: Single-Exposure HDR-NVS—training using only LDR images with a single exposure time. The core challenge is that single-exposure images inevitably contain overexposed or underexposed regions, leading to incomplete information and rendering direct reconstruction of HDR content impossible.

Method¶

Overall Architecture¶

Mono-HDR-3D is a meta-algorithm that can be seamlessly integrated into any NVS model, such as NeRF or 3DGS. The overall pipeline consists of three stages:

LDR 3D Scene Modeling: Takes single-exposure LDR images and camera poses as input to train a standard LDR 3D scene model (NeRF/3DGS).
LDR-to-HDR Lifting: Promotes the LDR color space to HDR via an L2H-CC (LDR-to-HDR Color Converter).
HDR-to-LDR Closed-Loop: Converts the HDR images back to LDR via an H2L-CC (HDR-to-LDR Color Converter) to form a closed loop, enabling self-supervised training without HDR labels.

Key design concept: First model LDR, then lift to HDR, rather than attempting to directly construct an HDR model from single-exposure LDR (which fails). This is the opposite design route of prior methods.

Key Designs¶

Camera Imaging Mechanism Modeling¶

The core innovation lies in designing the network architectures of L2H-CC and H2L-CC based on 物理成像公式 (physical imaging formulations).

LDR Image Formation Formula (forward process from HDR to LDR):

\[I^l = \frac{\Delta t}{g} \cdot I^h + I_0 + \epsilon - I_{\text{overflow}}\]

where \(\Delta t\) is the exposure time, \(g\) is the sensor gain, \(I^h\) is the HDR pixel value, \(I_0\) is the dark current offset, \(\epsilon\) is the sensor noise, and \(I_{\text{overflow}}\) is the saturation overflow value. This formula uniformly describes the imaging process of both saturated and unsaturated pixels, which can be organized into two functional terms:

\(D(\cdot)\): Linearly scales HDR radiance to the LDR range.
\(B(\cdot)\): Learns the offset and correction of LDR radiance.

Inverse Formula (backward process from LDR to HDR):

\[I^h = \underbrace{\frac{g}{\Delta t}}_{X(\cdot)} \cdot \underbrace{(I^l - I_0 + I_{\text{overflow}})}_{S(\cdot)} - \underbrace{\frac{g}{\Delta t} \cdot \epsilon}_{Y(\cdot)}\]

This is decomposed into three functional terms: - \(X(\cdot)\): Linear amplification factor, which linearly maps LDR values to the HDR range. - \(S(\cdot)\): Offset correction, adjusting the amplified LDR values. - \(Y(\cdot)\): Noise correction term.

L2H-CC (LDR-to-HDR Color Converter)¶

L2H-CC is a per-channel operation, and its network structure strictly simulates the three terms of the inverse formula:

Input Mapping: Linear layer + ReLU, embedding LDR colors into a latent feature space.
Three-Branch Simulation:
- \(X(\cdot)\) branch: MLP + ReLU (ensuring non-negativity to satisfy physical constraints).
- \(S(\cdot)\) branch: MLP + ReLU (non-negative correction values).
- \(Y(\cdot)\) branch: MLP without activation function (noise is inherently random, hence no non-negativity constraint).
- Residual Connection: The LDR input is added to the converted output through a residual structure to preserve fine color details and stabilize the learning process.

H2L-CC (HDR-to-LDR Color Converter, Closed-Loop Design)¶

H2L-CC simulates the forward imaging formula, mapping the rendered HDR images back to LDR. This allows for supervised learning by comparing with LDR training images, even in the absence of ground-truth HDR data:

Input Mapping: Linear layer + ReLU.
Two-Branch Simulation:
- \(D(\cdot)\) branch: Linear layer + ReLU (non-negative linear scaling).
- \(B(\cdot)\) branch: Linear layer + Tanh (offset correction, allowing positive and negative values).
- Output Mapping: Sigmoid activation, constraining the values to the LDR range of [0,1].

Loss & Training¶

Overall loss function:

\[\mathcal{L} = \mathcal{L}_{\text{ldr}} + \alpha \mathcal{L}_{\text{hdr}} + \beta \mathcal{L}_{\text{h2l}}\]

Mono-HDR-GS Instantiation (when integrated with 3DGS):

\(\mathcal{L}_{\text{ldr}}\): L1 loss + D-SSIM loss (standard 3DGS loss), balanced by weight \(\lambda\).
\(\mathcal{L}_{\text{hdr}}\): L2 loss in the \(\mu\)-law domain, calculated after applying logarithmic compression to the HDR values.
\(\mathcal{L}_{\text{h2l}}\): Identical form as \(\mathcal{L}_{\text{ldr}}\), but applied to the output of H2L-CC.

Mono-HDR-NeRF Instantiation (when integrated with NeRF): MSE is used for all three losses.

Hyperparameters: \(\alpha=0.6\), \(\beta=0.01\) (NeRF) / \(0.05\) (3DGS); L2H-CC learning rate of \(5 \times 10^{-4}\), H2L-CC learning rate of \(1 \times 10^{-3}\).

Key: In the strict single-exposure setting, \(\alpha=0\) (since no ground-truth HDR exists). In this case, the closed-loop design of H2L-CC serves as the only source of extra supervision.

Key Experimental Results¶

Main Results¶

Dataset: 8 synthetic scenes (Blender) + 4 real-world scenes. Each scene contains 35 images across 5 exposure times. Under the single-exposure setting, 1 exposure is randomly selected for training. Evaluation metrics: PSNR↑ / SSIM↑ / LPIPS↓.

Synthetic Dataset Results (LDR + HDR NVS):

Method	Speed(fps)	LDR-PSNR↑	LDR-SSIM↑	LDR-LPIPS↓	HDR-PSNR↑	HDR-SSIM↑	HDR-LPIPS↓
HDR-NeRF	0.26	30.62	0.658	0.285	13.76	0.511	0.443
Mono-HDR-NeRF	0.26	38.78	0.936	0.048	32.86	0.940	0.068
HDR-GS	147.45	39.48	0.977	0.018	35.30	0.965	0.030
Mono-HDR-GS	136.97	41.68	0.983	0.009	38.57	0.975	0.012

Real Dataset Results (LDR NVS, without HDR ground truth):

Method	PSNR↑	SSIM↑	LPIPS↓
HDR-NeRF	32.50	0.948	0.069
Mono-HDR-NeRF	32.52	0.948	0.069
HDR-GS	35.34	0.966	0.019
Mono-HDR-GS	35.81	0.967	0.017

Ablation Study¶

Module Design Ablation (Synthetic data, HDR NVS metrics):

Configuration	HDR-PSNR	HDR-SSIM	HDR-LPIPS	Description
Replace L2H-CC with MLP	19.02	0.778	0.327	L2H-CC is core, performance drops heavily after replacement
Replace H2L-CC with MLP	38.43	0.974	0.015	Slight decrease, closed-loop design has a positive effect
Full Model	38.57	0.975	0.012	-

Loss Combination Ablation (Synthetic data, HDR NVS metrics):

Configuration	HDR-PSNR	HDR-SSIM	HDR-LPIPS	Description
Only \(\mathcal{L}_{\text{ldr}}\)	-	-	-	Unable to train
Only \(\mathcal{L}_{\text{hdr}}\)	33.93	0.925	0.050	Basically usable
\(\mathcal{L}_{\text{ldr}}+\mathcal{L}_{\text{hdr}}\)	38.19	0.974	0.015	LDR provides geometric regularization
All three losses	38.57	0.975	0.012	Closed-loop contributes +0.38dB

Key Findings¶

HDR-NeRF fails almost completely under single exposure: HDR PSNR is only 13.76dB (tending toward all-black or all-white outputs), which proves that multi-exposure methods cannot directly transfer to the single-exposure setting.
Mono-HDR-NeRF vs HDR-NeRF: HDR PSNR improves by +19.1dB (13.76 to 32.86), showing an enormous leap in quality.
Mono-HDR-GS vs HDR-GS: HDR PSNR improves by +3.27dB (35.30 to 38.57), yielding a significant enhancement even on an already strong baseline.
No loss in efficiency: The inference speed of Mono-HDR-GS (136.97fps) is basically on par with HDR-GS (147.45fps).
Robustness to LDR/HDR ratio: Even when the ratio of LDR:HDR is 5:1, Mono-HDR-GS retains 92.6% of the peak PSNR.

Highlights & Insights¶

Clever problem definition: Proposes single-exposure HDR-NVS as an independent problem for the first time, lowering data collection requirements while eliminating the inherent flaws of multi-exposure methods.
Physics-driven network design: The architectures of L2H-CC and H2L-CC are derived directly from camera imaging formulas. Each network branch corresponds to a physical term, which is significantly more effective than a black-box MLP (as confirmed by the ablation study where replacing L2H-CC with an MLP caused a drops in PSNR by over 19dB).
Closed-loop self-supervision concept: The closed-loop design of H2L-CC allows for the indirect supervision of HDR space learning using only LDR images, providing an elegant label-free learning strategy.
Meta-algorithm design: Functions as a plug-and-play module that can be integrated into any NVS backbone, as validated by both NeRF and 3DGS instantiations.

Limitations & Future Work¶

Limited improvement in real-world scenes: The improvement of LDR NVS metrics on real-world data is marginal (PSNR increases by only +0.47dB), indicating that the method's advantages narrow under complex real illumination conditions.
Information ceiling in single exposure: Information in severely overexposed or underexposed regions is inherently lost, making it difficult for the network to truly recover it through learning alone.
Evaluation limitations: The ground-truth HDR for synthetic data comes from renderers, which may exhibit distribution shifts compared to real-world HDR images.
No video/dynamic scenes explored: The framework currently only processes multi-view images of static scenes.
Scalable to more backbones: The paper only validates NeRF and 3DGS, but integration into more efficient models like Instant-NGP or Zip-NeRF could be explored.

HDR-NeRF (CVPR 2022): The first HDR-NVS method. It learns an implicit mapping from radiance to HDR colors based on NeRF, but requires multi-exposure images and is expensive to train and infer.
HDR-GS (NeurIPS 2024): HDR-NVS based on 3DGS. It significantly improves efficiency (1000× speedup) but still relies on multi-exposure images.
Single-Image HDR Reconstruction (Eilertsen 2017, DCDR-UNet 2024): 2D image LDR-to-HDR translation, which lacks 3D consistency.
Inspiration: Physics-inspired network architecture design (rather than purely data-driven) is highly effective in scenarios with limited prior information; the closed-loop design idea can be extended to other 3D reconstruction tasks that lack corresponding labels.

Rating¶

Dimension	Score (1-5)	Description
Novelty	4.5	First proposes the single-exposure HDR-NVS problem, with a novel physics-driven closed-loop design
Technical Depth	4.0	Rigorous formula derivation, with modules designed based on physical principles
Experimental Thoroughness	4.0	Synthetic + real-world data with multi-angle ablations, though real-world experiments are slightly weak
Practical Value	4.5	A plug-and-play meta-algorithm that significantly reduces data acquisition requirements
Writing Quality	4.0	Clear structure with well-elaborated motivation
Overall Score	4.2	Valuable problem definition, elegant method design, and persuasive experiments