DarkIR: Robust Low-Light Image Restoration¶

Conference: CVPR 2025
arXiv: 2412.13443
Code: https://github.com/cidautai/DarkIR
Area: Image Restoration
Keywords: Low-light restoration, multi-task, frequency MLP, dilated attention, efficient CNN

TL;DR¶

DarkIR proposes an efficient CNN-based multi-task low-light image restoration method. The encoder uses SpAM+FreMLP (frequency magnitude enhancement) to handle illumination, while the decoder utilizes Di-SpAM (dilated spatial attention) to handle blur. With an asymmetric design, it achieves 27.30dB PSNR on LOLBlur with only 3.31M parameters.

Background & Motivation¶

Background: Low-light image restoration faces three coupled degradations: noise, blur, and insufficient illumination. Existing methods either address a single degradation (such as only enhancing brightness) or utilize Transformers (such as RetinexFormer, Restormer) which have large parameter sizes.
Limitations of Prior Work: (1) Transformer-based methods have a large number of parameters (Restormer has 26M+), making them unsuitable for edge deployment; (2) Existing CNN methods lack specialized designs for joint low-light and blur degradation; (3) Frequency-domain information is underutilized in low-light restoration.
Key Challenge: Brightness enhancement and deblurring require different features—the former requires global illumination estimation (frequency domain), while the latter requires local structure recovery with a large receptive field (spatial domain). However, processing both with a single unified module is inefficient.
Goal: To design an asymmetric encoder-decoder where the encoder focuses on illumination enhancement (frequency domain) and the decoder focuses on deblurring (large spatial receptive field).
Key Insight: Insufficient illumination in low-light conditions primarily manifests as amplitude attenuation in the frequency domain, whereas blur requires a large spatial receptive field. These two aspects are suited for different attention mechanisms.
Core Idea: The encoder employs FreMLP (which operates solely on FFT amplitude without altering phase) for illumination enhancement, while the decoder utilizes Di-SpAM (three groups of depthwise convolutions with different dilation rates) for deblurring.

Method¶

Overall Architecture¶

Low-light blurry image → Encoder (SpAM + FreMLP, progressive 8x downsampling) → Bottleneck → Decoder (Di-SpAM + Gated FFN, progressive upsampling) → Residual Connection → Restored Image. Asymmetric design: the encoder processes frequency-domain enhancement, and the decoder processes spatial-domain deblurring.

Key Designs¶

Frequency Domain MLP (FreMLP)
- Function: To enhance the amplitude of low-light images in the frequency domain.
- Mechanism: FFT → apply MLP transformation only to amplitude (phase remains unchanged) → IFFT. Inverted residual structure + Simplified Channel Attention (SCA).
- Design Motivation: Low-light degradation primarily manifests as frequency amplitude attenuation (especially in low-frequency components). Performing enhancement directly on the amplitude is more efficient and targeted than global spatial operations.
Dilated Spatial Attention (Di-SpAM)
- Function: To acquire a large receptive field for deblurring with low computational cost.
- Mechanism: Three groups of depthwise convolutions use different dilation rates (1, 4, 9) → pooling fusion → generate a spatial attention map with a large receptive field.
- Design Motivation: Deblurring requires a large receptive field to model motion range, but large convolutional kernels are computationally expensive. Dilated convolutions obtain an equivalent receptive field of \(1+4+9=14\) times at the cost of standard convolutions.
Multi-Task Joint Training
- Function: To simultaneously handle denoising, deblurring, and brightness enhancement.
- Mechanism: \(\mathcal{L} = \lambda_p L_1 + \lambda_{pe} L_{percep} + \lambda_{ed} L_{edge} + L_{lol}\), where \(L_{lol} = ||x_{\downarrow 8} - \hat{x}_{\downarrow 8}||_1\) is the scale-guide loss at 8x downsampling.
- Design Motivation: The three degradations co-exist in low-light scenarios, and handling them separately accumulates errors. \(L_{lol}\) ensures low-resolution structural consistency.

Loss & Training¶

L1 (\(\lambda_p=1\)) + LPIPS (\(\lambda_{pe}=0.01\)) + Edge Loss (\(\lambda_{ed}=50\)) + Low-resolution guided loss.

Key Experimental Results¶

Main Results¶

Method	LOLBlur PSNR↑	LOLBlur SSIM↑	LOLBlur LPIPS↓	Parameters
LEDNet	26.30	-	0.224	7.4M
RetinexFormer	26.02	-	0.181	1.61M
Restormer	-	-	-	26.13M
DarkIR-m	26.62	0.891	0.148	3.31M
DarkIR-l	27.30	0.898	0.137	12.96M

Ablation Study¶

Configuration	LOLv2-Real PSNR	Description
DarkIR-mt (Multi-task)	23.87	Multi-task version
DarkIR (Single-task)	-	LOLBlur +0.68 vs Multi-task
w/o FreMLP	Decrease	Key to frequency-domain enhancement
w/o Di-SpAM	Decrease	Key to deblurring

Key Findings¶

DarkIR-m with only 3.31M parameters is 55% smaller than LEDNet (7.4M) and 88% smaller than Restormer (26.13M), yet achieves better performance.
The multi-task version (DarkIR-mt) loses only 0.4dB PSNR compared to the single-task version, indicating a very low cost for multi-task training.
The contribution of frequency-domain FreMLP for low-light enhancement is greater than that of Di-SpAM for deblurring, indicating that brightness restoration is the more critical sub-task.

Highlights & Insights¶

Asymmetric Encoder-Decoder Design: The encoder and decoder focus on different types of degradation. This "division of labor" concept is highly generalizable for multi-task restoration.
FreMLP Operates Only on Amplitude, Leaving Phase Unchanged: It preserves phase information (i.e., structural information) and only enhances energy, which physically corresponds to illumination restoration.
Extreme Parameter Efficiency: Standard 3.31M parameters outperform models several times larger on the joint low-light and deblurring task.

Limitations & Future Work¶

The synthetic LOLBlur dataset uses frame averaging to simulate blur and EZ-DarkCE to reduce brightness, which still deviates from real-world nighttime degradations.
Real-LOLBlur lacks ground-truth images, preventing quantitative perceptual evaluation.
The frequency-domain method only processes amplitude; utilizing phase information might further improve restoration quality.
It assumes blur arises from long exposures and does not explicitly address other blur sources (such as object motion).

vs RetinexFormer: A Retinex-theory-based Transformer method, achieving 26.02 dB PSNR vs DarkIR's 27.30 dB. DarkIR transfer outperforms it with a simpler CNN structure.
vs Restormer: A general image restoration Transformer with 26M+ parameters. DarkIR is specifically designed for low light, offering an order of magnitude higher parameter efficiency.
vs LEDNet: A specialized low-light deblurring method. DarkIR achieves +1.0dB PSNR with 55% fewer parameters.

Rating¶

Novelty: ⭐⭐⭐⭐ The asymmetric design and FreMLP offer technical novelty.
Experimental Thoroughness: ⭐⭐⭐⭐ Evaluated on LOLBlur, LOLv2, and Real data with multiple ablation variants.
Writing Quality: ⭐⭐⭐⭐ Clear.
Value: ⭐⭐⭐⭐ An edge-deployment-friendly low-light restoration solution.