Luminance-Aware Statistical Quantization: Unsupervised Hierarchical Learning for Illumination Enhancement¶
Conference: NeurIPS 2025 arXiv: 2511.01510 Code: GitHub Area: Low-Light Image Enhancement / Image Restoration Keywords: Low-light image enhancement, diffusion model, power-law distribution, MCMC sampling, unsupervised learning
TL;DR¶
This paper proposes the LASQ framework, which reformulates low-light image enhancement (LLIE) as a statistical sampling process over hierarchical luminance distributions. By exploiting the power-law distribution inherent in natural luminance transitions, LASQ employs MCMC sampling to generate hierarchical luminance adaptation operators (LAOs) that are embedded into the forward process of a diffusion model, enabling fully unsupervised enhancement without requiring any normal-light reference images.
Background & Motivation¶
Background: LLIE methods are broadly divided into supervised (requiring paired data) and unsupervised approaches; recent integration of diffusion models has improved flexibility.
Limitations of Prior Work: - Supervised methods overfit to pixel-level correspondences, neglecting the continuous physical process underlying luminance transitions. - Unsupervised methods rely on pseudo-references (e.g., empirical gamma correction), inheriting their prior biases. - Both paradigms oversimplify the fundamentally continuous and context-dependent luminance dynamics, leading to limited generalization.
Key Challenge: A tension exists between reconstruction fidelity and cross-scene generalization — optimizing for in-domain accuracy degrades generalization, while prioritizing generalization weakens in-domain performance.
Goal: To establish a statistical model for LLIE grounded in the physical laws of natural illumination, without requiring paired data.
Key Insight: An empirical observation that natural luminance transitions follow a power-law density distribution, which can be approximated by hierarchical power functions.
Core Idea: Reformulate LLIE from deterministic pixel mapping to a statistical sampling process over hierarchical luminance distributions.
Method¶
Overall Architecture¶
Three core components: (1) Hierarchical Luminance Modeling — constructing a luminance variation coordinate system and designing hierarchical LAOs; (2) MCMC Sampling — generating a coarse-to-fine collection of LAOs; (3) Diffusion Model — embedding the hierarchical samples into the forward process for unsupervised learning.
Key Designs¶
-
Luminance Variation Coordinate System:
-
Function: Establishes a geometric framework for the relationship between low-light and normal-light luminance.
- Design Motivation: To mathematically formalize the physical laws governing luminance transitions.
-
Mechanism: For each pixel \(i\), the coordinate point \((I_L^{(i)}, I_N^{(i)})\) is observed to follow a power-law distribution \(y = ax^\kappa\). Different values of \(\kappa\) correspond to distinct adaptation strategies (\(\kappa < 0.5\): dark-region recovery; \(0.5 < \kappa < 1\): midtone enhancement; \(\kappa \to 1\): highlight preservation).
-
Hierarchical Luminance Adaptation Operator (LAO):
-
Function: Constructs multi-scale luminance correction operators ranging from global to local.
- Mechanism: For a region \(\mathcal{P}\), a luminance scalar \(G_\mathcal{P}\) and the corresponding LAO are computed as: $\(\gamma_\mathcal{P} = (\alpha + G_\mathcal{P})^{\beta_\mathcal{P}}, \quad \beta_\mathcal{P} = 2G_\mathcal{P} - 1 + \eta\frac{\sigma_{G_\mathcal{P}}^2}{\sigma_{G_\mathcal{P}}^2 + \delta}\)$
- Distribution Modeling: LAOs follow a truncated Gaussian distribution \(\gamma \sim \mathcal{N}_{\text{trunc}}(\mu=\gamma_0, \sigma^2; \gamma_{\min}, \gamma_{\max})\).
-
Physical Interpretation: High-probability operators correspond to physically plausible global adaptations, while low-probability operators capture local fine-grained adjustments.
-
MCMC Hierarchical Sampling:
-
Function: Progressively samples from the LAO distribution space to generate a coarse-to-fine set of enhanced images.
- Mechanism: The \(n\)-th iteration produces \(2^{n-1}\) LAO configurations: $\(p(\mathcal{I}_H^{(n)}) \approx \sum_{z=1}^{2^{n-1}} p(\mathcal{I}_H^{(n)}|\gamma_{\mathcal{P},z}^{(n)}) p(\gamma_{\mathcal{P},z}^{(n)})\)$ The transition kernel is a truncated Gaussian: \(q(\gamma_z^{(n)}|\gamma_{z-1}^{(n)}) = \mathcal{N}_{\text{trunc}}(\gamma_z^{(n)}|\gamma_{z-1}^{(n)}, \lambda^2)\).
-
Grid Strategy: At iteration \(n\), the image is partitioned into \(m_n \times w_n\) non-overlapping patches (where \(m_n = 2^{\lceil(n-1)/2\rceil}\)), realizing a coarse-to-fine spatial progression.
-
Hierarchically-Guided Diffusion:
-
Function: Embeds the MCMC-sampled hierarchical enhancements into the diffusion forward process.
- Mechanism: A time mapping \(\psi(t) = \lfloor t \cdot N/T \rfloor\) aligns the \(T\)-step diffusion process with the \(N\)-level hierarchy. Within each time interval \(T_n\), the corresponding \(\mathcal{F}_H^{(\psi(t))}\) serves as the illumination-normalized reference.
- Training: The objective combines a noise prediction loss \(\mathcal{L}_d\) and a global label weak guidance loss \(\mathcal{L}_g\).
- LASQ++ Extension: An adversarial discriminator conditioned on unpaired normal-light references can optionally be incorporated: $\(\mathcal{L}_{\text{total}} = \lambda_d\mathcal{L}_d + \lambda_g\mathcal{L}_g + \lambda_{\text{GAN}}\mathbb{E}[-\log\mathcal{D}_\phi(G_\theta(\mathcal{I}_L))]\)$
Loss & Training¶
- Noise prediction loss \(\mathcal{L}_d\) (weight 0.9) + global guidance loss \(\mathcal{L}_g\) (weight 0.005).
- Optional GAN loss (weight 0.7, LASQ++ mode).
- Adam optimizer, learning rate \(2 \times 10^{-5}\), U-Net backbone, \(T=1000\) diffusion steps.
Key Experimental Results¶
Main Results¶
Comparison on paired datasets (LOLv1 / LSRW):
| Type | Method | PSNR↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|---|
| SL | PyDiff | 23.275 | 0.859 | 0.108 |
| SL | SMG | 23.814 | 0.809 | 0.144 |
| UL | LightenDiffusion | 20.453 | 0.803 | 0.192 |
| UL | NeRCo | 19.738 | 0.740 | 0.239 |
| UL | LASQ | 20.375 | 0.814 | 0.191 |
| UL+ | LASQ++ | 20.481 | 0.807 | 0.205 |
No-reference datasets (DICM/NPE/VV) — the true strength of LASQ:
| Method | DICM NIQE↓ | NPE NIQE↓ | VV NIQE↓ |
|---|---|---|---|
| LightenDiffusion | 3.724 | 3.618 | 2.941 |
| NeRCo | 4.107 | 3.902 | 3.765 |
| LASQ | 3.715 | 3.571 | 2.777 |
LASQ comprehensively outperforms all methods — including supervised ones — on no-reference datasets, demonstrating strong cross-scene generalization.
Ablation Study¶
| Method | LOLv1 PSNR↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|
| Fixed Luminance Adj. | 16.741 | 0.715 | 0.273 |
| Limited Hierarchy (2 levels) | 19.139 | 0.792 | 0.243 |
| LASQ (Full) | 20.375 | 0.814 | 0.191 |
Computational Efficiency¶
| Method | FLOPs (G) | Params (M) | Inference Time (ms) |
|---|---|---|---|
| SCI | 0.13 | — | 50.14 |
| LightenDiffusion | 367.99 | 27.83 | 257.94 |
| LASQ | 219.75 | 24.08 | 213.89 |
LASQ preserves the performance advantages of diffusion models while achieving inference efficiency approaching non-diffusion methods.
Key Findings¶
- Adaptive MCMC sampling substantially outperforms fixed luminance adjustment (PSNR gap of 3.6 dB).
- Intermediate hierarchy levels are indispensable — the two-level simplified variant improves over fixed adjustment but falls short of the full LASQ (PSNR gap of 1.2 dB).
- LASQ surpasses all supervised methods in no-reference scenarios, confirming the generalization advantage of physics-driven modeling.
- Incorporating normal-light references (LASQ++) improves in-domain color fidelity but may slightly reduce generalization.
- Low sensitivity to hyperparameters: PSNR variation remains below 0.3 dB across the tested ranges of \(\alpha\), \(\eta\), \(\lambda_d\), and \(\lambda_g\).
Highlights & Insights¶
- Physics-driven paradigm shift: The first work to reformulate LLIE from deterministic pixel mapping to a statistical process grounded in the physical laws of natural luminance.
- No paired data required: MCMC sampling embedded in the diffusion forward process enables fully unsupervised training, fundamentally eliminating the dependence on paired data.
- Superior generalization: LASQ outperforms even supervised methods on no-reference datasets, demonstrating that physics-based priors generalize more effectively than data-driven mappings.
- Dual-mode compatibility: Seamlessly supports both settings — with and without normal-light references.
- Power-law distribution discovery: The empirical finding that natural luminance transitions follow a power-law distribution is itself a valuable contribution.
Limitations & Future Work¶
- MCMC sampling increases training time, though it is not used at inference.
- The power-law assumption may not hold in extreme regions (e.g., pure black or pure white areas).
- The current static power-law parameterization has not been validated for time-varying scenarios such as video.
- The U-Net backbone could be replaced with more advanced denoising networks (e.g., DiT) for further performance gains.
- Hardware–software co-design for sensor-specific noise characteristics remains unexplored.
Related Work & Insights¶
- Related to but fundamentally distinct from the "curve estimation" paradigm of Zero-DCE — LASQ performs statistical sampling rather than fitting a single curve.
- LightenDiffusion integrates Retinex theory into diffusion steps, whereas LASQ establishes a more general physical framework based on power-law distributions.
- The hierarchical MCMC sampling concept may generalize to other image degradation restoration tasks, such as dehazing and deraining.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ Reformulates LLIE as a statistical sampling problem with a unique and empirically grounded theoretical perspective.
- Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive coverage of paired/no-reference benchmarks, ablation studies, computational efficiency, and hyperparameter sensitivity.
- Writing Quality: ⭐⭐⭐⭐ Rigorous theoretical derivations and clear physical intuition, though mathematical notation is somewhat dense.
- Value: ⭐⭐⭐⭐⭐ Unsupervised, physics-driven, and highly generalizable — of significant practical value for real-world deployment without paired data.