Regularizing INR with Diffusion Prior for Self-Supervised 3D Reconstruction of Neutron CT Data¶
Conference: CVPR 2025
arXiv: 2603.10947
Code: To be released
Area: 3D Vision / CT Reconstruction
Keywords: Implicit Neural Representation, Diffusion Prior, Neutron CT, Sparse-view Reconstruction, Self-supervised 3D Reconstruction
TL;DR¶
This paper proposes DINR (Diffusive INR), which combines implicit neural representation (INR/SIREN) with a pretrained diffusion model prior. By regularizing the INR reconstruction with the diffusion denoising output using a proximal loss at each DDIM timestep, DINR outperforms FBP, pure INR, DD3IP, and classical MBIR (qGGMRF) methods on sparse-view neutron CT (down to 4-5 views).
Background & Motivation¶
Background: Neutron CT is an important imaging modality for characterizing volume based on hydrogen distribution (e.g., fuel cells, lithium-ion batteries, concrete structures). However, the low neutron flux leads to long exposure times, making sparse-view reconstruction urgently needed to accelerate acquisition.
Limitations of Prior Work: FBP produces severe artifacts under sub-Nyquist sampling; MBIR with handcrafted priors (e.g., TV/qGGMRF) requires extensive parameter tuning and has limited expressive power; pure INR (SIREN) lacks strong image priors, resulting in unstable high-frequency reconstruction.
Key Challenge: Diffusion models can model complex image priors, but directly applying them to posterior sampling in inverse problems (e.g., DD3IP/SCD) does not fully exploit data consistency. Conversely, INR can flexibly integrate forward models but lacks learned priors.
Goal: How to combine the strong generative power of diffusion priors with the data consistency advantages of INR to achieve high-quality sparse-view neutron CT reconstruction?
Key Insight: Based on the modular design of the DD3IP framework, INR is used to replace the original data-driven inverse problem solver (DIS) within each DDIM timestep, incorporating the diffusion denoising estimate via a proximal loss.
Core Idea: Embedding INR as a differentiable inverse solver within the DD3IP diffusion framework, achieving online guidance of INR by the diffusion prior through proximal regularization.
Method¶
Overall Architecture¶
DINR operates within the DD3IP framework: it initializes the INR parameters \(\phi_T\) (using pure data consistency); in the reverse diffusion loop from \(t=T\) to \(t=1\), it first adapts the diffusion model weights \(\theta_{t-1}\) (SCD) at each step to obtain the denoised estimate \(\hat{x}_t\). Then, it optimizes the INR parameters \(\phi_{t-1}\) using a proximal loss, and finally advances to the next step via DDIM sampling.
Key Designs¶
-
Proximal INR Loss Function:
- Function: Adds a proximal regularization term of the diffusion denoising output to the standard data consistency (projection-domain MSE).
- Formula: \(\mathcal{L}_\phi(S, y, \hat{x}_{0|t}, \rho) = \text{MSE}(A F_\phi(S, A^*y), y) + \rho \cdot \text{MSE}(\hat{x}_t, F_\phi(S, A^*y))\)
- Design Motivation: \(\rho\) controls the influence strength of the diffusion prior. At initialization, \(\rho=0\) (pure data fitting), and in subsequent timesteps, the diffusion estimate provides increasingly cleaner prior guidance.
-
INR Architecture (SIREN + FBP Input):
- Function: Maps 3D coordinates to attenuation coefficients using SIREN (sine-activated MLP).
- Mechanism: Accepts the FBP reconstruction \(A^*y\) as an additional input channel to provide an initial estimate and accelerate convergence.
- Design Motivation: Dual inputs of coordinates and FBP allow the INR to obtain both precise coordinate localization and coarse structural information.
-
Noise Injection Scaling \(\omega\):
- Function: Controls the relative scale of the FBP reconstruction \(A^*y\) and noise \(\epsilon\) during reverse diffusion initialization.
- Formula: \(x_T \leftarrow \sqrt{\alpha_T} A^*y + \sqrt{1-\alpha_T} \epsilon * \omega\)
- Design Motivation: \(\omega\) acts as a tunable parameter to balance the low-frequency initial estimate with random exploration.
Loss & Training¶
- The diffusion model is pretrained on synthetic ellipsoid data (without needing real neutron CT data).
- SCD adapts the diffusion model weights at each timestep by minimizing \(\text{MSE}(A D_\theta(x_t|y), y)\).
- INR is re-optimized at each step using the proximal loss, employing Tomosipo to implement a distance-driven parallel-beam projector.
Key Experimental Results¶
Synthetic Data (256×256, 2 slices)¶
| Number of Views | FBP | INR (SIREN) | DD3IP | DINR |
|---|---|---|---|---|
| 4 views | 19.31/0.08 | 14.76/0.18 | 26.17/0.25 | 26.27/0.24 |
| 8 views | 21.67/0.18 | 28.15/0.35 | 28.37/0.34 | 28.56/0.38 |
| 16 views | 25.27/0.30 | 30.34/0.54 | 31.21/0.61 | 31.30/0.63 |
| 32 views | 29.62/0.43 | 32.85/0.66 | 32.91/0.74 | 33.43/0.76 |
Real Neutron CT Data¶
| Number of Views | FBP | MBIR(qGGMRF) | INR | DD3IP | DINR |
|---|---|---|---|---|---|
| 5 views | 19.9/0.10 | 21.02/0.04 | 20.18/0.03 | 20.89/0.06 | 21.27/0.05 |
| 9 views | 22.9/0.33 | 26.0/0.38 | 24.08/0.27 | 25.41/0.34 | 25.22/0.35 |
| 17 views | 25.91/0.55 | 28.1/0.58 | 27.3/0.54 | 28.04/0.62 | 27.56/0.62 |
| 33 views | 30.11/0.73 | 31.0/0.77 | 29.7/0.71 | 31.19/0.79 | 31.37/0.77 |
Ablation Study / ROI Analysis¶
| ROI Size | Observation |
|---|---|
| 8×8 ~ 32×32 (Microstructure region) | DINR significantly outperforms other methods |
| 48×48 ~ 64×96 (Including background) | MBIR is close to or outperforms DINR |
- DINR achieves the optimal reconstruction in microstructural details (pores/boundaries), but its advantage diminishes in large homogeneous background areas.
- This aligns with the inherent advantage of MBIR's qGGMRF prior in smooth regions.
Key Findings¶
- The diffusion model pretrained only on synthetic ellipsoids can effectively guide the reconstruction of real concrete microstructures, demonstrating OOD (out-of-distribution) adaptability.
- The advantage of DINR is most pronounced in the ultra-sparse range (4-5 views), where data constraints are extremely weak and a strong prior is most critical.
- The proximal regularization of INR is more flexible than the conjugate gradient DIS of DD3IP, allowing for seamless integration into the forward physical model.
- Better quantitative metrics are needed—PSNR/SSIM lack sufficient discriminative power in evaluating microstructure reconstruction quality.
Highlights & Insights¶
- Modular Diffusion-INR Fusion: The modular design of the DIS within the DD3IP framework allows INR to replace other solvers in a plug-and-play manner; the scalability of this framework is noteworthy.
- Synthetic Pretraining + OOD Inference: The diffusion model pretrained solely on synthetic data can guide real data reconstruction, reducing the reliance on in-domain training data.
- Insight from ROI Analysis: Conventional full-image PSNR can obscure the true advantages of a method in crucial regions (microstructures), highlighting the need for task-oriented evaluation.
Limitations & Future Work¶
- High computational overhead: both the INR parameters and the diffusion model weights must be optimized at each DDIM timestep.
- On real data, DINR fails to outperform MBIR across all view numbers (MBIR is superior at 9 and 17 views).
- \(\rho\) and \(\omega\) require meticulous parameter search; the authors admit that a more comprehensive search could yield better results.
- Only parallel-beam geometry was validated; it has not been extended to cone-beam or helical CT.
- Lack of comparison with other learning-based CT reconstruction methods (e.g., end-to-end U-Net).
Related Work & Insights¶
- vs DD3IP: DINR replaces the CG iterations in DD3IP with INR as the DIS, achieving better performance under ultra-sparse views.
- vs MBIR+qGGMRF: MBIR remains competitive at moderate view counts but requires exhaustive grid search for regularization parameters; DINR is superior under ultra-sparse views and in microstructural regions.
- vs Pure SIREN/INR: Lacking strong priors leads to high-frequency instabilities; the diffusion regularization in DINR effectively addresses this issue.
- This holds direct reference value for researchers working on industrial/scientific CT that requires low-scan-count acquisitions.
Rating¶
- Novelty: ⭐⭐⭐⭐ Embedding INR as DIS within DD3IP is a novel fusion approach.
- Experimental Thoroughness: ⭐⭐⭐ The data scale is small (2 slices), and there is a lack of ablation studies and more baselines.
- Writing Quality: ⭐⭐⭐ The method is described clearly, but the experimental analysis could be deeper.
- Value: ⭐⭐⭐⭐ Holds practical value for the field of scientific CT reconstruction.