Spiking Brain Compression: Post-Training Second-Order Compression for Spiking Neural Networks¶
Conference: NeurIPS 2025 arXiv: 2506.03996 Code: Not available Area: Model Compression Keywords: Spiking Neural Networks, Post-Training Compression, Hessian Matrix, Unstructured Pruning, Quantization
TL;DR¶
This paper proposes Spiking Brain Compression (SBC), a second-order post-training one-shot compression framework based on the Van Rossum Distance, designed specifically for spiking neural networks (SNNs). By introducing a Surrogate Membrane Potential (SMP) Hessian, SBC enables efficient module-wise pruning and quantization, and for the first time compresses SEW-ResNet152 and Spike-Driven Transformer at the ImageNet scale.
Background & Motivation¶
SNNs communicate via discrete spikes and are naturally suited for deployment on neuromorphic chips such as TrueNorth, Loihi, and SpiNNaker. However, these chips have limited memory and computational capacity, necessitating compression of SNNs.
Limitations of Prior Work:
Training-time pruning methods (e.g., LTH, UPF, STDS) require multiple iterative compression–training cycles, incurring prohibitive computational costs for large pretrained SNNs such as SEW-ResNet152 and Spiking Transformer.
Direct transfer of ANN post-training compression methods to SNNs performs poorly: methods such as Optimal Brain Compression (OBC) define their loss functions in the current domain, whereas the actual output of an SNN is a spike train, creating a fundamental objective mismatch.
Post-training quantization (PTQ) for SNNs lags behind: unlike the ANN domain, which has mature methods such as GPTQ, equivalent tools for SNNs remain underdeveloped.
The core motivation is to develop a one-shot post-training compression method that harnesses the accuracy advantages of second-order optimization while faithfully capturing the spike dynamics of SNNs.
Method¶
Overall Architecture¶
SBC operates at the module level, where each module consists of a Linear/Conv(+BN) layer followed by a LIF layer. After folding BatchNorm into Conv to form a single linear mapping, each module becomes Linear(W) → LIF. Given an input spike tensor \(X \in \{0,1\}^{T \times d_{\text{in}}}\), the module produces an output spike train \(S = f(X, W) \in \{0,1\}^{T \times d_{\text{out}}}\). The compression objective is:
Key Designs¶
-
Van Rossum Distance (VRD) Loss: Because spike trains are discrete 0/1 signals, a simple L2 norm ignores temporal distance between spikes. SBC uses the squared VRD as its loss: \(\mathcal{L}(W) = \|MS - M\hat{S}\|_2^2\), where \(M\) is a convolution matrix generated by the decay kernel \(k[t] = (1 - 1/\tau_m)^t \cdot (1/\tau_m)\). A key property is that the VRD decomposes independently per LIF neuron: \(\|MS - M\hat{S}\|_2^2 = \sum_{j=1}^{d_{\text{out}}} \|MS_{:,j} - M\hat{S}_{:,j}\|_2^2\), enabling parallelized Hessian computation across neurons.
-
Surrogate Membrane Potential (SMP) Hessian: The spike function \(S[t] = \Theta(U[t] - V_{th})\) is non-differentiable. Inspired by surrogate gradients, the derivative of the Heaviside function is replaced by a constant function \(g(u) = c\). Since the OBS framework depends only on the relative magnitude of the Hessian, \(c\) cancels out (setting \(c=1\)). As the second-order term \(h'' = g' = 0\), the SMP Hessian becomes: $\(\mathbf{H}_{\text{SMP}} = E_X[2(MX)^T MX]\)$ This is precisely the exact Hessian of the spike-free membrane potential least-squares loss \(\|MXw - MX\hat{w}\|_2^2\). Compared to the OBC Hessian \(\mathbf{H}_{\text{OBC}} = E_X[2X^TX]\), the SMP Hessian incorporates the convolution matrix \(M\), more accurately capturing spike temporal dynamics.
-
SBC Pruning Algorithm:
- Adaptive sparsity allocation: LAMPS is used to determine the per-module pruning ratio from the global sparsity target.
- Weight ordering (Step 1): OBS is applied per neuron to rank weights by ascending loss contribution; \(B_{\text{in}}\) weights are pruned per batch and the inverse Hessian is updated via the Woodbury identity. Time complexity: \(O(d_{\text{in}}^3 / B_{\text{in}})\).
- Weight pruning (Step 2): A mask is generated from the ranking, and remaining weights are updated in one shot using the grouped OBS formula: \(\delta_{\mathbb{P}} = -\mathbf{H}^{-1}_{:,\mathbb{P}}((\mathbf{H}^{-1})_{\mathbb{P}})^{-1}\mathbf{W}_{:,i}\).
Loss & Training¶
- SBC is a training-free one-shot post-training method requiring only a small calibration dataset to estimate the Hessian.
- The quantization algorithm follows the GPTQ framework, replacing the layer-wise Hessian with \(\mathbf{H}_{\text{SMP}}\), and supports symmetric uniform quantization.
- Calibration set ablations show that as few as 1–10 samples per class suffice for stable performance (std \(\leq 0.5\%\)).
Key Experimental Results¶
Main Results — One-Shot Post-Training Pruning (vs. ExactOBS and MBP)¶
| Dataset / Model | Sparsity | SBC Acc. Drop | ExactOBS Acc. Drop | MBP Acc. Drop |
|---|---|---|---|---|
| N-MNIST / 2FC | 97% | -1.59% | -45.07% | Worse |
| DVS128-Gesture / 5Conv2FC | 97% | -1.74% | -9.38% | Worse |
| CIFAR-100 / VGG16-SNN | 75% | — | SBC better by +7.47% | — |
| ImageNet / SEW-ResNet152* | Multi-level | SOTA | Not achievable | Not achievable |
| ImageNet / Spike-Driven Transformer* | Multi-level | SOTA | Not achievable | Not achievable |
(*First large SNN architectures to be successfully compressed.)
Quantization Results¶
| Dataset / Architecture | Bit-width | SBC | ExactOBS | RTN |
|---|---|---|---|---|
| N-MNIST / 2FC | 4-bit | 98.14% | 98.16% | 97.58% |
| N-MNIST / 2FC | 2-bit | 92.40% | 64.29% | 20.34% |
| CIFAR10-DVS / 4Conv2FC | 3-bit | 69.64% | 67.56% | 52.70% |
| DVS128-Gesture / 5Conv2FC | 2-bit | 64.79% | 62.78% | 53.82% |
Comparison with Iterative Pruning Methods¶
| Dataset / Method | Sparsity | Accuracy | Pruning Time |
|---|---|---|---|
| CIFAR-100 / LTH-IMP | 68.30% | Baseline | ×100 |
| CIFAR-100 / SBC | 68.30% | +2.02% | 1× |
| ImageNet / UPF | — | Baseline | ×1000 |
| ImageNet / SBC+FT | — | ≈UPF | 1× |
Ablation Study — Calibration Set Size¶
| Dataset / Model | Calibration Samples | Accuracy |
|---|---|---|
| CIFAR10-DVS / 4Conv2FC | 10 | 49.64% (±1.27) |
| CIFAR10-DVS / 4Conv2FC | 90 | 61.90% (±0.21) |
| CIFAR10-DVS / 4Conv2FC | 900 | 63.84% (±0.49) |
| ImageNet / SEW-ResNet18 | 100 (0.1/class) | 49.35% (±0.19) |
| ImageNet / SEW-ResNet18 | 1000 | 55.06% (±0.12) |
| ImageNet / SEW-ResNet18 | 50000 | 55.89% (±0.04) |
Key Findings¶
- Under extreme 2-bit quantization, SBC surpasses ExactOBS by 28.11 percentage points on N-MNIST, demonstrating the substantial advantage of the SMP Hessian in low-precision scenarios.
- SEW-ResNet152 is the largest and deepest SNN model successfully pruned to date; SBC is the only method capable of doing so.
- The calibration set can be extremely small (<1 sample/class), which is highly valuable for data-scarce neuromorphic application scenarios.
- The modular per-layer parallel design limits the space complexity of SBC to \(O(B_{\text{out}} \cdot d_{\text{in}}^2)\).
Highlights & Insights¶
- The introduction of VRD loss is both principled and elegant: it brings a classical metric from computational neuroscience for measuring spike train similarity into the model compression setting, seamlessly bridging the gap between the discrete spike outputs of SNNs and the continuous outputs of ANNs.
- The "constant surrogate gradient" simplification underlying SMP is surprisingly effective: the seemingly crude approximation \(g(u) = 1\) is theoretically justified because OBS depends only on the relative magnitudes of Hessian entries.
- The milestone of compressing large-scale SNNs for the first time paves the way for future large-scale SNN deployment.
Limitations & Future Work¶
- Weight compensation during quantization can push post-quantization weights outside the quantization grid, motivating research into better grid selection strategies.
- Only the simplest constant surrogate gradient is employed; more refined surrogate gradient functions may further improve performance.
- The space complexity of the custom Hessian, \(\Theta(d_{\text{out}} \cdot d_{\text{in}}^2)\), can reach 43.5 GB for the largest layers in SEW-ResNet152, constraining the margin for further refinement.
- Quantization experiments are conducted only on small neuromorphic datasets and have not been scaled to ImageNet.
Related Work & Insights¶
- SBC generalizes the OBC/GPTQ framework from ANNs to SNNs; the core innovation lies in adapting the loss function and Hessian to reflect SNN-specific characteristics.
- SBC is complementary to UPF (training-time pruning): SBC is well suited for scenarios with existing large pretrained SNNs, while UPF is better suited for progressive compression during training.
- The work also offers insights for ANN compression: when outputs are not continuous values but some form of discrete or structured signal, compression objectives must be designed to match the output modality.
Rating¶
- Novelty: ⭐⭐⭐⭐ The VRD loss and SMP Hessian are elegant designs tailored to SNN characteristics
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Covers 7 datasets, multiple architectures (CNN + Transformer), pruning + quantization, and calibration set ablations
- Writing Quality: ⭐⭐⭐⭐ Rigorous derivations and comprehensive experiments, though some notation is dense
- Value: ⭐⭐⭐⭐⭐ Fills the gap in SNN post-training compression, achieves the first compression of large-scale SNNs, and offers immediate value to the neuromorphic computing community