Image Demoiréing in RAW and sRGB Domains¶

Conference: ECCV 2024
arXiv: 2312.09063
Code: https://github.com/rebeccaeexu/RRID
Area: Others
Keywords: Image demoiréing, RAW domain processing, ISP learning, Frequency domain filtering, Dual-domain fusion

TL;DR¶

This paper proposes the RRID framework to jointly utilize RAW and sRGB dual-domain data for image demoiréing. It designs the SCDM demoiréing module equipped with GFM (Gated Feedback Module) and FSM (Frequency Selection Module), along with RGISP to implement device-specific ISP learning for color restoration assistance, outperforming the state-of-the-art (SOTA) by 0.62dB in PSNR.

Background & Motivation¶

Background: Taking photos of screen content with smartphones has become a daily routine, but the frequency aliasing between the camera CFA (Color Filter Array) and screen LCD sub-pixels produces moiré patterns, which severely degrade image quality. Existing demoiréing methods mainly operate in the sRGB domain, with representative methods including DMCNN (multi-resolution CNN), MopNet (edge-guided + pattern attribute), WDNet (wavelet transform dual-branch), FHDMi (two-stage method), ESDNet (ultra-high definition lightweight model), etc.

Limitations of Prior Work: Demoiréing in the sRGB domain yields limited effectiveness because non-linear operations (e.g., demosaicing) in the ISP further deteriorate the moiré patterns originally present in the RAW domain. Therefore, some studies (e.g., RDNet, RawVDmoiré) advocate demoiréing in the RAW domain. However, relying solely on RAW data faces a severe issue—the transformation from RAW to sRGB (the ISP process) is uncertain. Using only RAW data fails to obtain accurate color correction information, resulting in obvious color cast in output images.

Key Challenge: RAW-domain demoiréing is effective but causes color cast, while sRGB-domain demoiréing preserves color information but makes moiré patterns harder to remove. Using either domain alone has inherent deficiencies—it is a dilemma of "not being able to have both."

Goal: How to simultaneously utilize the complementary advantages of both RAW and sRGB domains: (1) the RAW domain provides more pristine, weaker moiré signals for pattern removal; (2) the sRGB domain provides color references generated by the device ISP for color restoration; (3) learning a device-specific ISP to complete accurate RAW-to-sRGB translation.

Key Insight: Modern smartphones and DSLR cameras can capture RAW and sRGB images simultaneously (e.g., iPhone 15 Pro, Huawei P60 Pro), making paired RAW-sRGB data accessible in practical scenarios. Leveraging this practical condition, the authors propose using both RAW and sRGB as inputs to let the model learn a device-specific ISP for color restoration assistance.

Core Idea: Jointly utilizing RAW (superior for pattern removal) and sRGB (superior for color accuracy) dual-domain data, the proposed method achieves simultaneous pattern removal and color correction through specially designed demoiréing modules and a learnable ISP.

Method¶

Overall Architecture¶

The inputs of RRID are the paired RAW image \(\mathbf{I}_{raw} \in \mathbb{R}^{H/2 \times W/2 \times 4}\) (packed RGGB) and the sRGB image \(\mathbf{I}_{rgb} \in \mathbb{R}^{H \times W \times 3}\). The system consists of three core stages: (1) Shallow feature extraction—extracting features from RAW and sRGB respectively using convolution + DCAB, followed by downsampling sRGB to align resolutions; (2) SCDM demoiréing—performing multi-scale demoiréing separately on RAW features (equipped with GFM) and sRGB features (equipped with FSM) to obtain pre-demoiréed features \(\mathbf{D}_{raw}\) and \(\mathbf{D}_{rgb}\); (3) RGISP for RAW-to-sRGB color conversion—using the color information of \(\mathbf{D}_{rgb}\) to guide the color space transformation of \(\mathbf{D}_{raw}\); (4) 4 RSTBs (Residual Swin Transformer Blocks) to perform global tone mapping and detail refinement, outputting the final demoiréed sRGB image.

Key Designs¶

Skip-Connection-based Demoiréing Module (SCDM) + GFM/FSM:
- Function: Perform targeted multi-scale demoiréing on both RAW and sRGB branches respectively.
- Mechanism: SCDM is based on a multi-scale U-Net architecture, with DCAB (Dilated Channel Attention Block) as the basic building unit. DCAB uses multiple dilated convolutional layers to expand the receptive field, combined with channel attention to adaptively scale features. The core innovation lies in embedding targeted demoiréing modules within the skip connections: GFM (Gated Feedback Module) is used for the RAW branch to adaptively distinguish texture details and moiré patterns through a feature gating mechanism. Specifically, the intermediate features are split along the channel dimension into \(\mathbf{F}_{gate}\) and \(\mathbf{F}_{content}\), and selective preservation is performed on the content via element-wise multiplication with the GELU-activated gate. FSM (Frequency Selection Module) is used for the sRGB branch, which suppresses moiré patterns in the frequency domain using a learnable band-reject filter—specifically, in the 8x8 block DCT domain, a convolutional layer is used to learn adaptive frequency-selective attenuation.
- Design Motivation: Placing the demoiréing module in the skip connections instead of the backbone network offers two major advantages: (1) Efficiency—Block DCT in FSM represents high computational complexity, and placing it in the backbone would make inference time skyrocket to 4.6 seconds, whereas placing it in the skip connection takes only 0.089 seconds; (2) Better information flow—skip connections transmit multi-scale features from the encoder, and performing demoiréing here can preserve high-frequency details while removing moiré patterns.
RGB Guided ISP (RGISP):
- Function: Utilize the color information from pre-demoiréed sRGB features to learn a device-specific ISP, converting RAW features to the sRGB domain.
- Mechanism: Inspired by the principle of matrix transformation in traditional ISPs for color space conversion, RGISP is implemented using transposed cross-attention. It generates Query \(\mathbf{Q}\) and Key \(\mathbf{K}\) from RAW features \(\mathbf{D}_{raw}\), and Value \(\mathbf{V}\) from sRGB features \(\mathbf{D}_{rgb}\). The transformation matrix is calculated as \(\mathbf{M} = \text{Softmax}(\mathbf{Q} \cdot \mathbf{K}^T / \lambda) \in \mathbb{R}^{C \times C}\), followed by \(\mathbf{D}_{out} = \mathbf{M} \cdot \mathbf{V}\). Here, \(\mathbf{M}\) practically functions as a learned channel-wise color transformation matrix that is globally shared (analogous to global white balance and color space settings in ISPs), assisted by depth-wise and point-wise convolutions to perform local detail correction.
- Design Motivation: Compared to directly concatenating RAW and sRGB features or using self-attention, the cross-attention mechanism allows the color information of sRGB to better "guide" the color conversion of RAW features. In experiments, RGISP outperforms self-attention and traditional RRM methods by 0.3-0.5dB.
DCAB（Dilated Channel Attention Block）:
- Function: Act as the basic building block of SCDM, providing feature encoding and decoding with a large receptive field.
- Mechanism: DCAB consists of a series of dilated convolutional layers (with different dilation rates) + ReLU activation + channel attention mechanism, paired with residual connections. Dilated convolutions expand the receptive field without increasing parameter size, which is critical for detecting and removing multi-scale moiré patterns. Channel attention adaptively recalibrates the importance of each channel.
- Design Motivation: Moiré patterns exhibit multi-scale characteristics—different frequency interferences produce patterns of varying scales. Dilated convolutions represent a more efficient way of expanding the receptive field compared to simply stacking convolutional layers.

Loss & Training¶

The total loss is a weighted sum of the L1 losses in the RAW and sRGB domains: \(\mathcal{L} = \alpha \|\hat{\mathbf{Y}}_{raw} - \mathbf{Y}_{raw}\|_1 + \|\hat{\mathbf{Y}}_{rgb} - \mathbf{Y}_{rgb}\|_1\), where \(\alpha=0.5\). AdamW optimizer is used (\(\beta_1=0.9, \beta_2=0.999\)), with a multi-step learning rate schedule starting at \(2 \times 10^{-4}\) for 500 epochs, batch size of 80, and trained on 4 RTX 3090 GPUs.

Key Experimental Results¶

Main Results (TMM22 Dataset)¶

Method	Input Type	PSNR	SSIM	LPIPS	Parameters (M)	Inference Time (s)
DMCNN	sRGB	23.54	0.885	0.154	1.55	0.052
ESDNet	sRGB	26.77	0.927	0.089	5.93	0.115
RDNet	RAW	26.16	0.921	0.091	6.04	1.094
RawVDmoiré	RAW	27.26	0.935	0.075	5.33	0.182
RRID (Ours)	sRGB+RAW	27.88	0.938	0.079	2.38	0.089

Ablation Study¶

Configuration	PSNR	SSIM	Description
B6: Full RRID	27.88	0.938	Full Model
B1: w/o RAW input and RAW branch	25.79	0.915	PSNR drops dramatically by 2dB
B2: w/o sRGB input and sRGB branch	27.24	0.929	Causes severe color cast
B4: w/o RGISP	27.38	0.932	Degraded color correction capability
S1: w/o GFM & FSM	27.00	0.926	Demoiréing capability degrades significantly
S5: Completely replacing GFM with FSM	27.32	0.930	Domain-specific design outperforms unified design

Key Findings¶

The RAW input contributes significantly—removing the RAW branch results in a PSNR drop of over 2dB, verifying the superiority of demoiréing in the RAW domain.
Although using RAW only (B2) yields good demoiréing performance, it causes severe color cast, while incorporating sRGB (full model) resolves both issues simultaneously.
The domain-specific design of GFM and FSM (GFM for RAW, FSM for sRGB) outperforms swapping or unified designs—the PSNR drops by 0.3-0.5dB after swapping them.
Placing the demoiréing module in the skip connections (0.089s) is 50 times faster and achieves better performance compared to placing it in the backbone (4.6s).
The cross-attention in RGISP outperforms self-attention and traditional RRM by 0.34dB and 0.67dB respectively.
On the pure sRGB dataset FHDMi (with the RAW branch removed), RRID still achieves runner-up performance, demonstrating generalization ability.

Highlights & Insights¶

Complementary dual-domain design concept: The physical advantages of RAW (12-14 bit depth, no non-linear ISP processing, weaker moiré patterns) and the color advantages of sRGB complement each other perfectly. An "optimal combination" is achieved via the cross-attention of RGISP. This "best of both worlds" multi-input design scheme can be transferred to other joint RAW+sRGB tasks.
Efficiency design of demoiréing in skip connections: Placing the computationally intensive FSM (Block DCT) in the skip connections instead of the main stream reduces inference time from 4.6 seconds to 0.089 seconds—this architectural design trick of "placing heavy modules in side branches" is highly practical.
Learnable band-reject filters replacing frequency priors: Moiré patterns present specific frequency characteristics in the frequency domain but are difficult to craft manually. Learning an adaptive band-reject filter via convolutional layers in the Block DCT domain takes advantage of frequency-domain processing while avoiding the limitations of handcrafted designs.

Limitations & Future Work¶

The training data TMM22 has a resolution of only 256x256, limiting the model's capability to learn global tone/color mapping—the authors acknowledge that color restoration remains imperfect under severe color cast scenarios.
Currently, TMM22 is the only paired RAW-sRGB demoiréing dataset, highlighting an urgent need for larger-scale, higher-resolution datasets.
The block size of Block DCT in FSM is fixed at 8x8. More adaptive block size selection could be more effective for moiré patterns of different frequencies.
RGISP learns channel-wise color conversion, which has limited capacity to handle local color cast (e.g., varying color temperatures in different screen regions).
Comparison with the latest DSDNet (a RAW domain demoiréing method proposed in 2025) is not provided.

vs RDNet: A pioneering work in RAW-domain demoiréing using only RAW inputs + pre-trained ISPs for color conversion; this paper demonstrates the significant advantage of jointly utilizing sRGB-domain inputs.
vs RawVDmoiré: A RAW-domain video demoiréing method that achieves runner-up performance. By incorporating the sRGB branch, RRID outperforms it in PSNR while maintaining a faster inference speed.
vs ESDNet: A lightweight demoiréing model designed specifically for ultra-high-definition sRGB images; it has 2.5 times the parameter size of RRID but scores 1.1dB lower in PSNR.
vs CR3Net: Also utilizes paired RAW-sRGB data, but designed for reflection removal instead of demoiréing. CR3Net performs poorly on the demoiréing task (23.75dB), compared to RRID's 27.88dB.

Rating¶

Novelty: ⭐⭐⭐⭐ Joint RAW+sRGB dual-domain demoiréing is pioneered here, and the design of RGISP for learning device-specific ISPs is insightful.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Three levels of ablation studies (architecture/inputs, SCDM, RGISP) and cross-dataset generalization verification.
Writing Quality: ⭐⭐⭐⭐ The structure is clear, and the design motivation of each module is fully explained.
Value: ⭐⭐⭐⭐ It opens a new direction for joint RAW+sRGB image restoration with clear practical application prospects.