Learning to See in the Extremely Dark¶

Info	Content
Conference	ICCV2025
arXiv	2506.21132
Code	JianghaiSCU/SIED
Area	Image Enhancement / Low-Light RAW Image Restoration
Keywords	Extremely dark scenes, RAW image enhancement, diffusion models, data synthesis, illumination correction

TL;DR¶

This paper proposes a paired-to-paired data synthesis pipeline to construct SIED, a RAW image enhancement dataset for extremely dark scenes (down to 0.0001 lux), and designs a diffusion-model-based framework that achieves high-quality restoration of ultra-low-SNR RAW images via an Adaptive Illumination Correction Module (AICM) and a color consistency loss.

Background & Motivation¶

Core Challenges¶

Low-light RAW image enhancement must simultaneously address global/local contrast enhancement, noise suppression, detail preservation, and color mapping. Existing methods primarily operate under moderate low-light conditions (e.g., 0.03–5.0 lux as in the SID dataset), but when ambient illuminance drops to 0.0001 lux, two fundamental difficulties arise:

Data scarcity: Under extremely dark conditions, long-exposure reference images cannot be reliably captured (residual noise and motion blur are introduced).

Performance bottleneck: Extremely low SNR causes existing methods to produce severe color distortion, blurred details, and noise amplification.

Limitations of Prior Work¶

Single-stage methods (SID, DID, etc.): A single network is tasked with modeling both denoising and RAW-to-sRGB transformation, leading to domain ambiguity.
Multi-stage methods (LDC, MCR, etc.): Although tasks are decoupled, color bias remains under extremely dark conditions.
Pre-amplification strategy: Most methods rely on ground-truth (GT) exposure values for pre-amplification, which are unavailable in practice.
Dataset issues: SID, SDSD, SMID, etc. provide only coarse illuminance ranges and lack precise calibration.

Method¶

I. SIED Dataset Construction: Paired-to-Paired Synthesis Pipeline¶

Unlike conventional approaches that synthesize low-light images from normal-light images, this paper proposes a "synthesize even darker from low-light" strategy:

Step 1: Optical Laboratory Calibration - Sony α7RIII and Canon EOS R cameras are used in a professional optical laboratory (equipped with a PHOTO-2000μ illuminance meter) to capture standard RAW images at three illuminance levels: - 0.01–0.1 lux - 0.001–0.01 lux - 0.0001–0.001 lux

Step 2: Real-Scene Paired Capture - Low-light RAW and normal-light sRGB pairs are collected in diverse real scenes using a tripod and remote shutter. - Reference image exposure time is 20–200× that of the low-light image. - Images are cropped to 3840×2160 resolution; each subset contains 1,680 pairs.

Step 3: Illumination Alignment

\[I_{syn} = I_{cap} \times \left(\frac{\text{Expo}(I_{st})}{\text{Expo}(I_{cap})} + \eta\right)\]

Images are converted to YUV space via an ISP pipeline, and \(\eta\) is manually adjusted so that the Y-channel illuminance histogram matches the standard data. KL divergence is kept below 0.06.

Step 4: Noise Addition - Gaussian + Poisson noise distributions for Canon and Sony cameras are fitted from optical laboratory measurements. - A dark-frame database is incorporated to handle non-Poisson–Gaussian noise at extremely low light. - ISO range: 100–20,000 for 0.01–0.1 and 0.001–0.01 lux; 100–40,000 for 0.0001–0.001 lux.

The final dataset contains 1,500 training pairs + 180 evaluation pairs per illuminance level per camera subset.

II. Diffusion-Model-Based Enhancement Framework¶

Overall Architecture¶

A RAW encoder \(\mathcal{E}_{raw}\) and an sRGB encoder \(\mathcal{E}_{rgb}\) map inputs to latent space.
AICM applies adaptive illumination correction to the RAW features.
The diffusion model uses the corrected RAW features as guidance to reconstruct sRGB features.
The sRGB decoder produces the final output image.

Adaptive Illumination Correction Module (AICM)¶

Rather than relying on GT exposure values for pre-amplification, AICM estimates amplification factors directly from low-light RAW features:

Convolutional embedding → cascaded convolution + adaptive average pooling → per-channel amplification factor \(A_{raw} \in \mathbb{R}^{1 \times 1 \times C}\)
An illumination correction loss based on Retinex theory:

\[\mathcal{L}_{icl} = \|\mathbf{L}_{\hat{\mathcal{F}}_{raw}} - \mathbf{L}_{\tilde{\mathcal{F}}_{raw}}\|_1 + \|\mathbf{R}_{\hat{\mathcal{F}}_{raw}} - \mathbf{R}_{\mathcal{F}_{raw}}\|_1\]

This ensures illumination improvement while preserving reflectance consistency.

Diffusion-Based RAW-to-sRGB Reconstruction¶

Forward diffusion: Encoded sRGB features \(\mathcal{F}_{rgb}\) are progressively noised to Gaussian noise:

\[\mathbf{x}_t = \sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t}\boldsymbol{\epsilon}_t\]

Reverse diffusion: Conditioned on corrected RAW features \(\hat{\mathcal{F}}_{raw}\), clean sRGB features are recovered from noise:

\[p_\theta(\hat{\mathbf{x}}_{t-1}|\hat{\mathbf{x}}_t, \tilde{\mathbf{x}}) = \mathcal{N}(\hat{\mathbf{x}}_{t-1}; \boldsymbol{\mu}_\theta(\hat{\mathbf{x}}_t, \tilde{\mathbf{x}}, t), \sigma_t^2\mathbf{I})\]

Color Consistency Loss¶

A KL divergence constraint based on color histograms promotes accurate color mapping:

\[\mathcal{L}_{ccl} = \sum_{c \in [0,C)} \mathcal{H}_{\hat{\mathcal{F}}_{rgb}^c} \log\left(\frac{\mathcal{H}_{\hat{\mathcal{F}}_{rgb}^c}}{\mathcal{H}_{\mathcal{F}_{rgb}^c} + \tau}\right)\]

III. Two-Stage Training¶

Stage 1: The encoder–decoder and AICM are optimized while the diffusion model is frozen. \(\mathcal{L}_{stage1} = \mathcal{L}_{con} + \mathcal{L}_{icl}\)
Stage 2: The diffusion model is optimized while all other modules are frozen. \(\mathcal{L}_{stage2} = \mathcal{L}_{cdl} + \lambda\mathcal{L}_{ccl}\) (\(\lambda=0.1\))

Key Experimental Results¶

Main Results: SIED Dataset (Canon Subset)¶

Type	Method	0.01–0.1 lux PSNR/SSIM/LPIPS	0.001–0.01 lux	0.0001–0.001 lux
Single-stage	SID	20.69/0.811/0.428	20.34/0.799/0.450	19.28/0.764/0.497
Single-stage	SGN	21.79/0.813/0.421	21.07/0.800/0.447	19.42/0.762/0.514
Multi-stage	DNF	24.03/0.813/0.456	23.47/0.796/0.486	21.63/0.769/0.522
Multi-stage	RAWMamba	22.63/0.791/0.461	21.99/0.782/0.482	21.05/0.757/0.521
—	Ours	24.85/0.849/0.360	24.02/0.839/0.379	22.52/0.811/0.435

State-of-the-art performance is achieved across all three illuminance levels and all metrics. At the 0.0001–0.001 lux level, the proposed method outperforms the second-best DNF by +0.89 dB PSNR.

Ablation Study¶

Variant	PSNR	SSIM	LPIPS
Single-stage training	22.96 (−1.89)	0.809 (−0.040)	0.431 (+0.071)
w/o AICM	23.23 (−1.62)	0.839 (−0.010)	0.378 (+0.018)
Fixed amplification = 100	23.74 (−1.11)	0.841 (−0.008)	0.373 (+0.013)
Fixed amplification = 200	23.48 (−1.37)	0.844 (−0.005)	0.371 (+0.011)
Fixed amplification = 300	23.18 (−1.67)	0.838 (−0.011)	0.382 (+0.022)
w/o color consistency loss	24.53 (−0.32)	0.836 (−0.013)	0.389 (+0.029)
Full method	24.85	0.849	0.360

Key findings: - AICM contributes the most (−1.62 dB); adaptive amplification substantially outperforms any fixed amplification factor. - Two-stage training is critical (−1.89 dB); unstable encoded features in early training impede diffusion model learning. - The color consistency loss primarily improves SSIM and LPIPS, enhancing color accuracy.

Comparison on the SID Dataset¶

On the public SID dataset (Sony subset), the proposed method also achieves state-of-the-art performance: PSNR 31.20, surpassing DNF and RAWMamba by +0.58 dB each. Furthermore, the method requires no GT exposure values for pre-amplification.

Highlights & Insights¶

Data synthesis innovation: The paired-to-paired strategy is more realistic than conventional normal-to-low-light synthesis; optical laboratory calibration ensures precise illuminance control.
Elegant AICM design: By eliminating dependence on GT exposure information, the module is more practical for real-world deployment.
Novel application of diffusion models: The generative capacity and inherent denoising properties of diffusion models are well-suited to RAW enhancement.
Color histogram loss: Distribution-level constraints replace pixel-level supervision, making them more appropriate for color mapping tasks.
Generalization to real scenes: Models trained on synthetic data transfer directly to real extremely dark scenes.

Limitations & Future Work¶

Data synthesis relies on costly optical laboratory equipment.
Only Sony and Canon cameras are supported; cross-camera generalization remains to be validated.
Diffusion model inference is slow (20-step sampling), limiting real-time applicability.
Absolute PSNR values under extremely dark conditions remain modest (22.52 dB), reflecting irreversible information loss.

SID (CVPR 2018): Pioneering work on RAW low-light enhancement, but with a limited illuminance range.
DNF (CVPR 2023): A representative multi-stage method and the primary competitor of this work.
Retinex theory: Provides the design inspiration for the illumination correction loss.
The proposed data synthesis paradigm can be generalized to dataset construction under other extreme conditions (e.g., underwater, hazy scenes).

Rating¶

⭐⭐⭐⭐ — The dataset contribution is outstanding (filling the gap for extremely dark scenes), and the method design is principled and effective; however, experimental validation is conducted primarily on the authors' own dataset.