Learnable Fractional Reaction-Diffusion Dynamics for Under-Display ToF Imaging and Beyond¶
Conference: ICCV 2025 arXiv: 2511.01704 Code: https://github.com/wudiqx106/LFRD2 Area: Depth Restoration / Computational Imaging Keywords: Under-display ToF imaging, fractional reaction-diffusion, continuous convolution, depth restoration, physics-driven
TL;DR¶
LFRD² proposes a hybrid framework that combines learnable time-fractional reaction-diffusion equations with neural networks for under-display ToF (UD-ToF) depth map restoration. The approach captures long-range memory dependencies across iterations via fractional calculus and introduces an efficient continuous convolution operator to replace discrete convolution, achieving state-of-the-art performance on UD-ToF depth restoration, ToF denoising, and depth super-resolution tasks.
Background & Motivation¶
- The full-screen trend has driven the development of under-display sensors: under-display RGB cameras are already commercialized, with under-display ToF (UD-ToF) depth cameras as the next frontier.
- Transparent OLED (TOLED) panels severely degrade ToF camera signals through signal attenuation, multi-path interference (MPI), temporal noise, and other artifacts, substantially reducing depth quality.
- Traditional diffusion methods (e.g., P-M diffusion) exploit physical priors for depth refinement and offer robust adaptability and generalization, but suffer from complex parameter modeling and high computational cost.
- Deep learning methods excel at high-level image understanding and contextual reasoning, yet rely heavily on architectural design and data quality, lacking physical interpretability.
- Integer-order algorithm unrolling maps iterative algorithms to deep network layers, but relies on integer-order differential equations (IDEs) in which the predicted state depends only on the current state, ignoring historical information.
- Fractional-order differential equations (FDEs) possess memory properties — the current state depends on all historical states — which better reflects real physical processes.
- Core Motivation: Can neural networks be used to learn solutions to fractional reaction-diffusion equations, improving depth restoration quality while maintaining physical interpretability?
Method¶
Overall Architecture¶
LFRD² consists of two stages: 1. Deep Initial State Builder (DISB): Uses an existing network (e.g., UD-ToFnet) to generate an initial depth map \(u_0\). 2. Deep Fractional Reaction-Diffusion Module: Performs iterative depth refinement based on the Caputo fractional derivative.
Key Designs¶
-
Fractional Reaction-Diffusion Dynamics:
- Employs the Caputo fractional derivative of order \(0 < \alpha < 1\): \({}^C_0 D_t^\alpha u(t) = \frac{1}{\Gamma(1-\alpha)} \int_0^t (t-\tau)^{-\alpha} u'(\tau) d\tau\)
- Discretized via the L1 approximation, yielding the iterative scheme: \(u_{n+1} = u_n + S[\text{div}(g|\nabla u_n|\nabla u_n) + \lambda(u_0 - u_n)] - \sum_{k=1}^n a_k^{(\alpha)}(u_{n+1-k} - u_{n-k})\)
- where \(S = \Gamma(2-\alpha)/a_0^\alpha\) and \(a_k^{(\alpha)} = (k+1)^{1-\alpha} - k^{1-\alpha}\).
- Memory property: The current state \(u_{n+1}\) depends on all historical states \(u_0, \ldots, u_n\), rather than \(u_n\) alone.
- The fractional order \(\alpha\) is dynamically predicted by the neural network, rather than being fixed a priori.
- In the diffusion term \(\text{div}(g|\nabla u|\nabla u)\), the function \(g(\cdot)\) is learned by the neural network rather than defined by a traditional conductance function.
- The reaction term \(\lambda(u_0 - u_n)\) drives the depth evolution toward the target state.
-
Efficient Continuous Convolution Operator:
- Conventional discrete convolution disregards the continuity of natural scenes.
- Existing continuous convolution implementations (MLP-based Neural Fields) incur high computational cost and complex hyperparameter tuning.
- This paper exploits the repeated differentiation/integration property: \(u * \mathcal{K} = u^{(-n)} * \mathcal{K}^{(n)}\).
- At \(n=2\), the estimated kernel \(\hat{\mathcal{K}}^{(2)}\) degenerates into a sparse Dirac delta.
- Novelty: Rather than predefining Gaussian kernels and control points, the DISB directly generates the Dirac delta.
- The antiderivative of the signal is efficiently approximated as \(u^{(-2)} \approx A \cdot u(x_0, y_0) + B\), where coefficients \(A\) and \(B\) are predicted by a three-layer convolution network.
- Compared to NFC (Neural Field Convolution), this design reduces FLOPs by 62% (7.69G vs. 20.5G) with faster inference speed (22.75ms vs. 28.42ms).
-
Physical Interpretability:
- The entire iterative process encodes a time-fractional reaction-diffusion equation.
- The non-local nature of fractional calculus provides an appropriate framework for describing dynamic processes with memory effects.
- The neural network serves as an estimator of the fractional order, which can be viewed as a form of physics-informed neural networks (PINNs).
Loss & Training¶
- Adam optimizer with initial learning rate \(1 \times 10^{-4}\) and batch size 16.
- SUD-ToF trained for 250 epochs; RUD-ToF trained for 1000 epochs.
- Original \(180 \times 240\) images are cropped to \(176 \times 240\) patches.
- DISB is based on UD-ToFnet with its original settings preserved.
- Reaction term coefficient \(\lambda = 0.01\).
- Experiments conducted on an NVIDIA RTX 3090.
Key Experimental Results¶
Main Results (Tables)¶
SUD-ToF / RUD-ToF Datasets:
| Method | SUD-ToF MAE↓ | SUD-ToF RMSE↓ | RUD-ToF MAE↓ | RUD-ToF RMSE↓ |
|---|---|---|---|---|
| PE-ToF | 9.77 | 15.92 | 21.22 | 48.76 |
| NAFNet | 11.08 | 18.24 | 20.41 | 33.83 |
| Restormer | 9.75 | 14.76 | 18.94 | 31.78 |
| UD-ToFnet | 8.88 | 11.50 | 17.29 | 31.11 |
| LFRD² (Ours) | 8.41 | 10.99 | 16.73 | 30.94 |
FLAT Dataset (ToF Denoising):
| Method | MAE↓ | RMSE↓ |
|---|---|---|
| SHARPnet | 4.62 | 10.26 |
| UD-ToFnet | 4.41 | 8.23 |
| LFRD² | 4.13 | 7.35 |
NYUv2 Dataset (Depth Super-Resolution, MSE/MAE):
| Method | 4× | 8× | 16× |
|---|---|---|---|
| DSR-EI | 2.94/0.49 | 13.3/1.19 | 57.0/2.70 |
| LFRD² | 2.85/0.47 | 12.8/1.16 | 52.3/2.58 |
Ablation Study (Tables)¶
Core Component Ablation (RUD-ToF):
| Configuration | Params/M | FLOPs/G | Speed/ms | MAE↓ | RMSE↓ |
|---|---|---|---|---|---|
| Baseline (UD-ToFnet) | 2.17 | 8.65 | 15.20 | 17.29 | 31.11 |
| + GRU | +0.18 | +7.62 | 19.89 | 17.02 | 31.09 |
| + LSTM | +0.24 | +10.4 | 22.15 | 16.96 | 31.22 |
| + LFRD² | +0.18 | +7.69 | 22.75 | 16.73 | 30.94 |
| w/o FC (integer-order) | +0.01 | +0.41 | 20.67 | 17.00 | 30.99 |
| w/o CC (no continuous conv.) | +0.17 | +7.28 | 22.11 | 16.88 | 31.03 |
| NFC | +0.13 | +20.5 | 28.42 | 16.97 | 31.00 |
Fractional Order Ablation:
| Order \(\alpha\) | 0.1 | 0.3 | 0.5 | 0.7 | 0.9 | Learnable (Ours) |
|---|---|---|---|---|---|---|
| MAE/mm | 18.12 | 18.38 | 18.86 | 19.01 | 18.29 | 17.62 |
| \(\rho_{1.02}\)/% | 66.79 | 66.80 | 66.16 | 65.73 | 66.04 | 67.43 |
Key Findings¶
- Fractional vs. integer order: Removing fractional calculus (w/o FC) increases MAE from 16.73 to 17.00, confirming the importance of memory properties.
- Continuous vs. discrete convolution: Removing continuous convolution (w/o CC) increases MAE from 16.73 to 16.88.
- vs. RNN variants: LFRD² outperforms both GRU and LSTM on MAE and RMSE, while maintaining parameter count comparable to GRU.
- vs. NFC: Achieves comparable accuracy while reducing FLOPs by 62% and improving speed by 25%.
- Learnable order outperforms all fixed orders, dynamically adapting to the optimal order for each sample.
- Cross-task generalization: The same framework achieves state-of-the-art results across three distinct tasks — UD-ToF restoration, ToF denoising, and depth super-resolution.
- Plug-and-play: The DISB can be replaced by various baselines (PE-ToF, NAFNet, Restormer, etc.), consistently yielding improvements.
Highlights & Insights¶
- Hybrid physics-driven and data-driven design: Embedding fractional PDEs into neural network iterations provides both physical interpretability and learning flexibility.
- Memory property of fractional calculus is the key innovation: leveraging a weighted combination of all historical iteration states makes the approach more robust than integer-order methods that rely solely on the current state.
- Dynamic fractional order learned by the network addresses the difficulty of manual parameter selection in traditional fractional-order methods.
- The efficient continuous convolution implementation (coefficient prediction + repeated differentiation) is more practical than MLP-based Neural Fields.
- Strong cross-task applicability: The same framework, without modification, applies to different depth restoration tasks.
Limitations & Future Work¶
- Training requires careful configuration to avoid numerical instability (NaN values); robustness needs improvement.
- The current explicit numerical scheme may benefit from implicit formulations for greater stability and efficiency.
- The number of iteration steps is fixed; adaptive step-size control could further improve efficiency.
- The choice of DISB affects final performance, but selection criteria for the optimal initializer are not thoroughly investigated.
- Although more efficient than NFC, continuous convolution still increases inference time by approximately 50%.
Related Work & Insights¶
- Perona-Malik Diffusion: A classical image enhancement diffusion model serving as the integer-order baseline for this work.
- TNRD (Chen & Pock, 2016): Trainable Nonlinear Reaction Diffusion, learning time-varying filter parameters from data.
- UD-ToFnet (Qiao et al.): A pioneering work on UD-ToF depth restoration, serving as the DISB baseline in this paper.
- NFC (Nsampi et al.): Continuous convolution implementation based on repeated differentiation, the reference method for the continuous convolution design.
- Algorithm Unrolling: Unrolls iterative algorithms into network layers, forming the methodological foundation of LFRD².
- Insight: The potential of fractional calculus in image processing remains largely underexplored.
Rating¶
- Novelty: ⭐⭐⭐⭐ — Combining fractional reaction-diffusion with deep learning and the learnable-order design are both novel contributions.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Four datasets, three tasks, and detailed ablations covering core components, continuous convolution inputs, and order selection.
- Writing Quality: ⭐⭐⭐⭐ — Rigorous mathematical derivations, clear illustrations, and thorough physical explanations.
- Value: ⭐⭐⭐⭐ — Methodological contribution to physics-driven depth restoration with strong cross-task generalizability.