Learning Single Index Models with Diffusion Priors¶
Conference: ICML2025
arXiv: 2505.21135
Code: Pending
Area: Diffusion Model Theory
Keywords: Diffusion Models, Signal Recovery, Single Index Models, Nonlinear Measurements, Inverse Problems, Compressed Sensing
TL;DR¶
An efficient method utilizing diffusion model priors to recover signals from nonlinear observations of Semi-parametric Single Index Models (SIM) is proposed. It requires only one round of unconditional sampling and partial inversion without knowing the link function, significantly outperforming existing methods on 1-bit and cubic measurements with minimal NFE.
Background & Motivation¶
Traditional compressed sensing assumes a linear measurement model \(\boldsymbol{y} = \mathbf{A}\boldsymbol{x}^* + \boldsymbol{e}\), but the measurement process is nonlinear in many practical problems. Single Index Models (SIM) represent one of the most popular nonlinear measurement models:
where \(f\) is an unknown and potentially discontinuous element-wise nonlinear link function. The objective is to reconstruct the signal \(\boldsymbol{x}^*\) using only the measurement matrix \(\mathbf{A}\) and observations \(\boldsymbol{y}\), without knowledge of \(f\).
Existing signal recovery work based on diffusion models (DMs) suffers from the following limitations:
- Methods like DPS, DAPS: Assume the link function \(f\) is known and differentiable, making them unable to process discontinuous functions (such as \(\text{sign}(\cdot)\)).
- QCS-SGM: Restricted to quantized compressed sensing and suffers from extremely slow reconstruction speeds (requiring tens of thousands of NFEs).
- DDRM, MCG, etc.: Mainly target linear settings.
The core motivation of this paper is: Can an efficient diffusion model method be designed to solve signal recovery under SIM without relying on knowledge of the link function?
Method¶
Core Idea: Treating \(\mathbf{A}^T\boldsymbol{y}/m\) as a Noisy Signal¶
The key observation of the paper stems from the following lemma: under mild conditions of SIM,
where \(\mu = \mathbb{E}[f(\boldsymbol{a}^T\boldsymbol{x}^*)\boldsymbol{a}^T\boldsymbol{x}^*]\). This indicates that \(\mathbf{A}^T\boldsymbol{y}/m\) is essentially a noisy version of \(\mu\boldsymbol{x}^*\), with the noise level proportional to \(1/\sqrt{m}\).
Comparison of Three Methods¶
The paper proposes three strategies, where the key difference lies in how they utilize the sampling \(G\) and inversion \(G^\dagger\) of the diffusion model:
| Method | Formula | Operation |
|---|---|---|
| SIM-DMFIS | \(\hat{\boldsymbol{x}} = G \circ G^\dagger(\mathbf{A}^T\boldsymbol{y}/m)\) | Complete inversion from \(\epsilon\) followed by complete sampling |
| SIM-DMS | \(\hat{\boldsymbol{x}} = G_{t^*}(\alpha_{t^*}C_s'\mathbf{A}^T\boldsymbol{y}/m)\) | Partial sampling (denoising) starting only from \(t^*\) |
| SIM-DMIS ⭐ | \(\hat{\boldsymbol{x}} = G \circ G^\dagger_{t^*}(\alpha_{t^*}C_s'\mathbf{A}^T\boldsymbol{y}/m)\) | Partial inversion from \(t^*\) to \(T\), followed by complete sampling |
Determination of the Intermediate Step \(t^*\)¶
By matching the noise level of \(\mathbf{A}^T\boldsymbol{y}/m\) with the noise schedule of the diffusion forward process, the intermediate step \(t^*\) is selected to satisfy:
where \(C_s\) is a tunable parameter. This is a theoretically-driven design: greater noise (smaller \(m\)) shifts the inversion starting point closer to \(T\).
Algorithmic Flow (SIM-DMIS)¶
- Input: Measurement matrix \(\mathbf{A}\), observations \(\boldsymbol{y}\), data prediction network \(\boldsymbol{x}_\theta\) of the pre-trained DM.
- Calculate the intermediate step \(t^*\) based on \(C_s/\sqrt{m}\).
- Construct the initial vector \(\alpha_{t^*}C_s'\mathbf{A}^T\boldsymbol{y}/m\).
- Execute partial inversion \(G^\dagger_{t^*}\) from \(t^*\) to \(T\) (using the DM2M second-order inversion method).
- Execute complete sampling \(G\) from \(T\) to \(\epsilon\) (using DDIM sampling).
- Output: Reconstructed signal \(\hat{\boldsymbol{x}}\).
Theoretical Analysis¶
| Theorem/Lemma | Content | Significance |
|---|---|---|
| Lemma 2 | \(\|\mathbf{A}^T\boldsymbol{y}/m - \mu\boldsymbol{x}^*\|_\infty = O(\sqrt{\log n/m})\) | Establishes the noise level estimate to guide the selection of \(t^*\) |
| Lemma 3 | Generator \(G\) is \(L\)-Lipschitz continuous under Lipschitz conditions | Ensures that errors are not amplified by the sampling process |
| Theorem 3 | \(\|\bar{\boldsymbol{x}}_\epsilon - G \circ G^\dagger_t(\bar{\boldsymbol{x}}_t)\|_2 = O(\sqrt{n}(h_{\max}^{k_2} + Lh_{\max}^{k_1}))\) | Error upper bound of SIM-DMIS, related to step size \(h_{\max}\) and numerical orders \(k_1, k_2\) |
| Assumption 1 | The data prediction network \(\boldsymbol{x}_\theta(\cdot, t)\) is \(L_t\)-Lipschitz with respect to the first parameter | Standard assumption adopted by many theoretical works on DMs |
The theory demonstrates that utilizing high-order numerical methods (\(k_1, k_2 \geq 2\)) can significantly reduce reconstruction errors.
Key Experimental Results¶
FFHQ 256×256, 1-bit Measurements (\(m = n/8\))¶
| Method | NFE | PSNR ↑ | SSIM ↑ | LPIPS ↓ |
|---|---|---|---|---|
| QCS-SGM | 11555 | 12.91 | 0.51 | 0.50 |
| DPS-N | 1000 | 11.14 | 0.37 | 0.69 |
| SIM-DMS | 50 | — | — | — |
| SIM-DMIS | 150 | Best | Best | Best |
Key Findings¶
- SIM-DMIS outperforms QCS-SGM (which requires 11555 NFE) with only 150 NFE, achieving a 77x speedup.
- In 1-bit measurements, SIM-DMIS remains superior even though DPS-N and DAPS-N exploit knowledge of the link function \(f\).
- Partial inversion (SIM-DMIS) significantly outperforms complete inversion (SIM-DMFIS), validating the theoretical intuition of starting the inversion from the intermediate step \(t^*\).
- Consistent performance is achieved across FFHQ and ImageNet (CIFAR-10 is shown in the Appendix).
Highlights & Insights¶
- No Knowledge of Link Function Required: This is the core advantage. In reality, the link functions of nonlinear measurement models are often unknown or non-differentiable; this method bypasses this limitation entirely.
- Theoretically-Driven Intermediate Step Selection: By aligning the noise level of \(\mathbf{A}^T\boldsymbol{y}/m\) with the diffusion noise schedule \(\sigma_t/\alpha_t\) via Lemma 2, the inversion starting point \(t^*\) is elegantly determined.
- Extremely High Computational Efficiency: Requires only a single round of sampling + partial inversion (150 NFE), without iterative optimization or gradient computation.
- Counter-Intuitive Finding that Partial Inversion Outperforms Complete Inversion: Executing complete inversion starting from \(\epsilon\) incorrectly assumes that the input complies with the data distribution \(q_0\), whereas starting from \(t^*\) matched with the noise level is more reasonable.
- A Unified Framework to handle different nonlinear measurements (1-bit, cubic, quantization, etc.) without requiring individual designs for each measurement type.
Limitations & Future Work¶
- Not Applicable to Phase Retrieval: The condition \(\mu \neq 0\) excludes cases where \(f(x) = x^2\) or \(f(x) = |x|\).
- Tuning Dependencies: \(C_s\) and \(C_s'\) require tuning for different measurement models and datasets.
- Gap Between Theory and Practice: The error bound in Theorem 3 depends on the Lipschitz constant \(L\), while the actual \(L\) of DMs can be very large.
- Matrix Storage Overhead: Requires explicit storage of the \(m \times n\) measurement matrix \(\mathbf{A}\), which is unfriendly to high-resolution images.
- Unexplored Structured Measurement Matrices: Only i.i.d. Gaussian measurements are considered, whereas practical measurement matrices are usually structured.
Related Work & Insights¶
- DPS (Chung et al., 2023): Signal recovery based on posterior sampling, requiring a known forward model.
- DAPS (Zhang et al., 2024): Extends DPS to nonlinear settings, but still requires \(f\) to be differentiable.
- QCS-SGM (Meng & Kabashima, 2022): Uses SGM for quantized compressed sensing, but requires tens of thousands of NFEs.
- CSGM (Bora et al., 2017): Pioneered signal recovery using generative model priors.
- The proposed method can inspire other inverse problems: as long as observations can be expressed as a noisy version of the signal, the partial inversion + sampling framework of diffusion models can be utilized.
Rating¶
- Novelty: ⭐⭐⭐⭐ — The idea of determining the inversion starting point from the perspective of noise level matching is novel and theoretically sound.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Comprehensive comparisons across multiple datasets, measurement models, and baselines, with extensive ablations included in the Appendix.
- Writing Quality: ⭐⭐⭐⭐ — Theoretical derivations are clear, notations are standard, and comparisons among the three methods are visually intuitive.
- Value: ⭐⭐⭐⭐ — Provides an efficient and general diffusion model solution for nonlinear inverse problems.