Skip to content

Learnable Fractional Reaction-Diffusion Dynamics for Under-Display ToF Imaging and Beyond

Conference: ICCV 2025 arXiv: 2511.01704 Code: https://github.com/wudiqx106/LFRD2 Area: Depth Restoration / Computational Imaging Keywords: Under-display ToF imaging, fractional reaction-diffusion, continuous convolution, depth restoration, physics-driven

TL;DR

LFRD² proposes a hybrid framework that combines learnable time-fractional reaction-diffusion equations with neural networks for under-display ToF (UD-ToF) depth map restoration. The approach captures long-range memory dependencies across iterations via fractional calculus and introduces an efficient continuous convolution operator to replace discrete convolution, achieving state-of-the-art performance on UD-ToF depth restoration, ToF denoising, and depth super-resolution tasks.

Background & Motivation

  • The full-screen trend has driven the development of under-display sensors: under-display RGB cameras are already commercialized, with under-display ToF (UD-ToF) depth cameras as the next frontier.
  • Transparent OLED (TOLED) panels severely degrade ToF camera signals through signal attenuation, multi-path interference (MPI), temporal noise, and other artifacts, substantially reducing depth quality.
  • Traditional diffusion methods (e.g., P-M diffusion) exploit physical priors for depth refinement and offer robust adaptability and generalization, but suffer from complex parameter modeling and high computational cost.
  • Deep learning methods excel at high-level image understanding and contextual reasoning, yet rely heavily on architectural design and data quality, lacking physical interpretability.
  • Integer-order algorithm unrolling maps iterative algorithms to deep network layers, but relies on integer-order differential equations (IDEs) in which the predicted state depends only on the current state, ignoring historical information.
  • Fractional-order differential equations (FDEs) possess memory properties — the current state depends on all historical states — which better reflects real physical processes.
  • Core Motivation: Can neural networks be used to learn solutions to fractional reaction-diffusion equations, improving depth restoration quality while maintaining physical interpretability?

Method

Overall Architecture

LFRD² consists of two stages: 1. Deep Initial State Builder (DISB): Uses an existing network (e.g., UD-ToFnet) to generate an initial depth map \(u_0\). 2. Deep Fractional Reaction-Diffusion Module: Performs iterative depth refinement based on the Caputo fractional derivative.

Key Designs

  1. Fractional Reaction-Diffusion Dynamics:

    • Employs the Caputo fractional derivative of order \(0 < \alpha < 1\): \({}^C_0 D_t^\alpha u(t) = \frac{1}{\Gamma(1-\alpha)} \int_0^t (t-\tau)^{-\alpha} u'(\tau) d\tau\)
    • Discretized via the L1 approximation, yielding the iterative scheme: \(u_{n+1} = u_n + S[\text{div}(g|\nabla u_n|\nabla u_n) + \lambda(u_0 - u_n)] - \sum_{k=1}^n a_k^{(\alpha)}(u_{n+1-k} - u_{n-k})\)
    • where \(S = \Gamma(2-\alpha)/a_0^\alpha\) and \(a_k^{(\alpha)} = (k+1)^{1-\alpha} - k^{1-\alpha}\).
    • Memory property: The current state \(u_{n+1}\) depends on all historical states \(u_0, \ldots, u_n\), rather than \(u_n\) alone.
    • The fractional order \(\alpha\) is dynamically predicted by the neural network, rather than being fixed a priori.
    • In the diffusion term \(\text{div}(g|\nabla u|\nabla u)\), the function \(g(\cdot)\) is learned by the neural network rather than defined by a traditional conductance function.
    • The reaction term \(\lambda(u_0 - u_n)\) drives the depth evolution toward the target state.
  2. Efficient Continuous Convolution Operator:

    • Conventional discrete convolution disregards the continuity of natural scenes.
    • Existing continuous convolution implementations (MLP-based Neural Fields) incur high computational cost and complex hyperparameter tuning.
    • This paper exploits the repeated differentiation/integration property: \(u * \mathcal{K} = u^{(-n)} * \mathcal{K}^{(n)}\).
    • At \(n=2\), the estimated kernel \(\hat{\mathcal{K}}^{(2)}\) degenerates into a sparse Dirac delta.
    • Novelty: Rather than predefining Gaussian kernels and control points, the DISB directly generates the Dirac delta.
    • The antiderivative of the signal is efficiently approximated as \(u^{(-2)} \approx A \cdot u(x_0, y_0) + B\), where coefficients \(A\) and \(B\) are predicted by a three-layer convolution network.
    • Compared to NFC (Neural Field Convolution), this design reduces FLOPs by 62% (7.69G vs. 20.5G) with faster inference speed (22.75ms vs. 28.42ms).
  3. Physical Interpretability:

    • The entire iterative process encodes a time-fractional reaction-diffusion equation.
    • The non-local nature of fractional calculus provides an appropriate framework for describing dynamic processes with memory effects.
    • The neural network serves as an estimator of the fractional order, which can be viewed as a form of physics-informed neural networks (PINNs).

Loss & Training

  • Adam optimizer with initial learning rate \(1 \times 10^{-4}\) and batch size 16.
  • SUD-ToF trained for 250 epochs; RUD-ToF trained for 1000 epochs.
  • Original \(180 \times 240\) images are cropped to \(176 \times 240\) patches.
  • DISB is based on UD-ToFnet with its original settings preserved.
  • Reaction term coefficient \(\lambda = 0.01\).
  • Experiments conducted on an NVIDIA RTX 3090.

Key Experimental Results

Main Results (Tables)

SUD-ToF / RUD-ToF Datasets:

Method SUD-ToF MAE↓ SUD-ToF RMSE↓ RUD-ToF MAE↓ RUD-ToF RMSE↓
PE-ToF 9.77 15.92 21.22 48.76
NAFNet 11.08 18.24 20.41 33.83
Restormer 9.75 14.76 18.94 31.78
UD-ToFnet 8.88 11.50 17.29 31.11
LFRD² (Ours) 8.41 10.99 16.73 30.94

FLAT Dataset (ToF Denoising):

Method MAE↓ RMSE↓
SHARPnet 4.62 10.26
UD-ToFnet 4.41 8.23
LFRD² 4.13 7.35

NYUv2 Dataset (Depth Super-Resolution, MSE/MAE):

Method 16×
DSR-EI 2.94/0.49 13.3/1.19 57.0/2.70
LFRD² 2.85/0.47 12.8/1.16 52.3/2.58

Ablation Study (Tables)

Core Component Ablation (RUD-ToF):

Configuration Params/M FLOPs/G Speed/ms MAE↓ RMSE↓
Baseline (UD-ToFnet) 2.17 8.65 15.20 17.29 31.11
+ GRU +0.18 +7.62 19.89 17.02 31.09
+ LSTM +0.24 +10.4 22.15 16.96 31.22
+ LFRD² +0.18 +7.69 22.75 16.73 30.94
w/o FC (integer-order) +0.01 +0.41 20.67 17.00 30.99
w/o CC (no continuous conv.) +0.17 +7.28 22.11 16.88 31.03
NFC +0.13 +20.5 28.42 16.97 31.00

Fractional Order Ablation:

Order \(\alpha\) 0.1 0.3 0.5 0.7 0.9 Learnable (Ours)
MAE/mm 18.12 18.38 18.86 19.01 18.29 17.62
\(\rho_{1.02}\)/% 66.79 66.80 66.16 65.73 66.04 67.43

Key Findings

  • Fractional vs. integer order: Removing fractional calculus (w/o FC) increases MAE from 16.73 to 17.00, confirming the importance of memory properties.
  • Continuous vs. discrete convolution: Removing continuous convolution (w/o CC) increases MAE from 16.73 to 16.88.
  • vs. RNN variants: LFRD² outperforms both GRU and LSTM on MAE and RMSE, while maintaining parameter count comparable to GRU.
  • vs. NFC: Achieves comparable accuracy while reducing FLOPs by 62% and improving speed by 25%.
  • Learnable order outperforms all fixed orders, dynamically adapting to the optimal order for each sample.
  • Cross-task generalization: The same framework achieves state-of-the-art results across three distinct tasks — UD-ToF restoration, ToF denoising, and depth super-resolution.
  • Plug-and-play: The DISB can be replaced by various baselines (PE-ToF, NAFNet, Restormer, etc.), consistently yielding improvements.

Highlights & Insights

  • Hybrid physics-driven and data-driven design: Embedding fractional PDEs into neural network iterations provides both physical interpretability and learning flexibility.
  • Memory property of fractional calculus is the key innovation: leveraging a weighted combination of all historical iteration states makes the approach more robust than integer-order methods that rely solely on the current state.
  • Dynamic fractional order learned by the network addresses the difficulty of manual parameter selection in traditional fractional-order methods.
  • The efficient continuous convolution implementation (coefficient prediction + repeated differentiation) is more practical than MLP-based Neural Fields.
  • Strong cross-task applicability: The same framework, without modification, applies to different depth restoration tasks.

Limitations & Future Work

  • Training requires careful configuration to avoid numerical instability (NaN values); robustness needs improvement.
  • The current explicit numerical scheme may benefit from implicit formulations for greater stability and efficiency.
  • The number of iteration steps is fixed; adaptive step-size control could further improve efficiency.
  • The choice of DISB affects final performance, but selection criteria for the optimal initializer are not thoroughly investigated.
  • Although more efficient than NFC, continuous convolution still increases inference time by approximately 50%.
  • Perona-Malik Diffusion: A classical image enhancement diffusion model serving as the integer-order baseline for this work.
  • TNRD (Chen & Pock, 2016): Trainable Nonlinear Reaction Diffusion, learning time-varying filter parameters from data.
  • UD-ToFnet (Qiao et al.): A pioneering work on UD-ToF depth restoration, serving as the DISB baseline in this paper.
  • NFC (Nsampi et al.): Continuous convolution implementation based on repeated differentiation, the reference method for the continuous convolution design.
  • Algorithm Unrolling: Unrolls iterative algorithms into network layers, forming the methodological foundation of LFRD².
  • Insight: The potential of fractional calculus in image processing remains largely underexplored.

Rating

  • Novelty: ⭐⭐⭐⭐ — Combining fractional reaction-diffusion with deep learning and the learnable-order design are both novel contributions.
  • Experimental Thoroughness: ⭐⭐⭐⭐ — Four datasets, three tasks, and detailed ablations covering core components, continuous convolution inputs, and order selection.
  • Writing Quality: ⭐⭐⭐⭐ — Rigorous mathematical derivations, clear illustrations, and thorough physical explanations.
  • Value: ⭐⭐⭐⭐ — Methodological contribution to physics-driven depth restoration with strong cross-task generalizability.