Fractional-Order Spiking Neural Network¶

Conference: ICLR 2026
OpenReview: https://openreview.net/forum?id=NJhBSLJ0nL
Code: https://github.com/PhysAGI/spikeDE
Area: Spiking Neural Networks / Neuromorphic Computing
Keywords: Spiking Neural Networks, Fractional Calculus, Non-Markovian Dynamics, Long-range Dependence, Robustness

TL;DR¶

This work replaces the first-order ODEs underlying the membrane potential evolution of spiking neurons with Caputo fractional-order ODEs. This endows neurons with an inherent "long memory" characterized by power-law decay, strictly generalizing the classical IF/LIF models (which recover the original models at \(\alpha=1\)). The approach achieves higher accuracy and stronger noise robustness in both neuromorphic vision and graph learning tasks.

Background & Motivation¶

Background: Spiking Neural Networks (SNNs) offer extremely low energy consumption on neuromorphic hardware through discrete spike communication and event-driven computing, making them naturally suited for temporal data processing. Currently, almost all SNNs are built upon Integrate-and-Fire (IF) or Leaky Integrate-and-Fire (LIF) neurons, whose dynamics are characterized by first-order ordinary differential equations (ODEs).

Limitations of Prior Work: First-order ODEs imply a Markovian assumption—the current state of the membrane potential depends only on the value of the previous moment, and historical information is rapidly forgotten at an exponential rate \(e^{-t/\tau}\). However, neurophysiological studies demonstrate that real neurons exhibit long-range correlations, fractal dendritic structures, and interactions between multiple membrane conductances. These non-Markovian behaviors cannot be expressed by integer-order models, effectively limiting the representation capacity of the network.

Key Challenge: Fractional calculus provides mathematical tools to describe systems with "memory." The fractional derivative \(d^\alpha/dt^\alpha\) weights the entire history via a power-law kernel. While prior research on single f-LIF neurons proved they can explain frequency adaptation and generate more reliable spikes under noise, the systematic integration of fractional neurons into deep SNNs remains an unexplored gap.

Goal: To construct a generalized fractional SNN (f-SNN) framework that subsumes IF/LIF and their variants as special cases where \(\alpha=1\), while providing theoretical guarantees and an open-source toolbox.

Core Idea: [First-order \(\to\) Fractional] Replace the first-order derivative \(d/dt\) in neuron dynamics with the Caputo fractional derivative \(D^\alpha\), transforming the membrane potential charging process into a power-law convolution of history to capture long-range temporal dependencies. [Strict Generalization] \(\alpha\) serves as an additional degree of freedom; \(\alpha=1\) recovers classical SNNs, while \(\alpha < 1\) introduces persistent memory.

Method¶

Overall Architecture¶

The f-SNN does not alter the network structure but replaces the "neuron kernel": the first-order ODE describing membrane potential charging in traditional SNNs is replaced by a fractional ODE (f-IF/f-LIF). This is then discretized using the fractional Adams–Bashforth–Moulton (ABM) numerical method, resulting in an iterative formula that performs power-law weighted convolution over all historical inputs. Since only the charging phase is modified—while spike generation and reset rules remain unchanged—f-SNN can be integrated as a plug-and-play module into any backbone such as CNN, ResNet, Transformer, or MLP. The number of trainable parameters remains identical to the original SNN.

flowchart LR
    X["Input Current / Synaptic Features X_k<br/>(Conv/MLP/ResNet/Transformer)"] --> C["Fractional Charging<br/>U_k = U_0 + Σ c_m^(α) · (·)<br/>Power-law Memory Kernel"]
    C --> S["Spike Sk = H(U_k − θ)<br/>(surrogate gradient)"]
    S --> R["Reset (soft / hard)"]
    R -.History Feedback.-> C
    S --> O["Spike Train Output"]

Key Designs¶

1. Fractional Neuron Dynamics: Replacing "Instant Forgetting" with "Power-law Memory" via Caputo Derivatives. The standard LIF model \(\tau\,dU/dt = -U + R I_{in}\) is replaced by \(\tau\,D^\alpha U(t) = -U(t) + R I_{in}(t)\). The Caputo fractional derivative is defined as \(D^\alpha y(t) = \frac{1}{\Gamma(1-\alpha)}\int_0^t (t-\tau)^{-\alpha} y'(\tau)\,d\tau\). The integral kernel \((t-\tau)^{-\alpha}\) implies that the evolution of the current membrane potential depends on the entire history, weighted by a power law. Intuitively, the order \(\alpha\) acts as a "memory knob": \(\alpha=1\) returns to standard LIF, while \(\alpha < 1\) introduces increasingly strong temporal correlations. The relaxation solution under constant input changes from exponential decay \(e^{-t/\tau}\) to a Mittag–Leffler function \(E_\alpha(-t^\alpha/\tau)\), which possesses a power-law long tail \(\sim t^{-\alpha}\)—the mathematical manifestation of "long memory."

2. Fractional ABM Discretization: Converting Continuous Memory into Computable Power-law Convolution. While first-order ODEs use one-step forward Euler iteration, fractional ODEs are non-local and require a weighted sum of all past terms. This work employs the fractional ABM predictor, leading to a unified iteration \(y_k = y_0 + \frac{1}{\Gamma(\alpha)}\sum_{j=0}^{k-1}\mu_{j,k}\,f(t_j,y_j)\), where weights \(\mu_{j,k} = \frac{h^\alpha}{\alpha}[(k-j)^\alpha - (k-1-j)^\alpha]\). Setting \(h=R=1\) yields a stationary power-law kernel \(c_m^{(\alpha)} = \frac{1}{\tau^\alpha\,\alpha\Gamma(\alpha)}[(m+1)^\alpha - m^\alpha]\), transforming the charging equation into \(U_k = U_0 + \sum_{m=0}^{k-1} c_m^{(\alpha)} X_{k-m}\) (f-IF). As \(\alpha \to 1\), \(c_m^{(1)}=1/\tau\) degrades to a constant kernel, whose first-order difference exactly recovers the Euler recurrence. Training is handled via surrogate gradients to resolve the non-differentiability of spikes.

3. Engineering Acceleration: Short-memory Truncation + FFT Convolution. Direct summation over the entire history incurs an \(O(N^2)\) cost. The authors apply the short-memory principle to truncate the summation window to a fixed width \(M\) (\(\sum_{m=\max(0,k-M)}^{k-1}\)), achieving \(O(NM)\) complexity. For full memory retention, FFT-based convolution is used to reduce complexity to \(O(N\log N)\), ensuring f-SNN remains trainable for long time-step tasks.

4. Three Theoretical Guarantees: From "Bio-plausibility" to "Superior Representation + Robustness." The paper provides three essential distinctions: (i) Persistent Memory—Proposition 1 shows the f-LIF relaxation solution is a Mittag–Leffler function with a power-law tail, where distant inputs still influence the present algebraically; (ii) Irreducibility—Theorem 2 proves that a single f-IF neuron with \(\alpha \in (0,1)\) cannot be exactly replicated by any finite linear combination of integer-order LIF neurons (error decays slowly at \(O(k^{\alpha-1})\)), requiring infinitely many integer-order units; (iii) Robustness—Theorem 1 proves that under constant input with perturbation \(\epsilon\), the membrane potential deviation of f-IF grows sublinearly \(\Delta U \propto t^\alpha\), whereas integer-order IF is linear \(\Delta U \propto t\). Furthermore, spike time sensitivity \(|\Delta t_s|\propto \epsilon\, I_c^{-(1+1/\alpha)}\) is smaller than the integer-order \(I_c^{-2}\), theoretically suppressing long-term error accumulation.

Key Experimental Results¶

Main Results: Neuromorphic Data Classification (Accuracy %, T=8~16)¶

Dataset	Architecture	LIF (SpikingJelly)	LIF (snnTorch)	f-LIF (f-SNN)
N-MNIST	CNN	99.27	99.08	99.48
DVS-Lip	CNN	42.41	32.71	43.42
DVS128Gesture	CNN	93.40	88.99	94.80
DVS128Gesture	Transformer	95.14	87.15	95.83
N-Caltech101	CNN	66.82	65.21	70.26
N-Caltech101	Transformer	72.63	65.67	76.27
HarDVS	CNN	46.10	46.26	47.66

The replacement of IF/LIF with f-IF/f-LIF yields consistent improvements across both CNN and Transformer backbones, with a maximum gain of +3.6% on N-Caltech101 (Transformer).

Graph Learning: Node Classification (Accuracy %, T=100, Avg of 20 runs)¶

Method	Cora	Citeseer	Pubmed	Photo	Computers	ogbn-arxiv
SGCN (SJ)	81.81	71.83	86.79	87.72	70.86	50.26
SGCN (f-SNN)	88.08	73.80	87.17	92.49	89.12	51.10
DRSGNN (SJ)	83.30	72.72	87.13	88.31	76.55	50.13
DRSGNN (f-SNN)	88.51	75.11	87.29	91.93	88.77	53.13

On the Computers dataset, SGCN sees an improvement of up to +18.3%, with no increase in trainable parameters.

Robustness & Energy Consumption¶

Five-dimensional Adversarial Robustness: f-SNN consistently outperforms two integer-order baselines across noise injection, occlusion blocks, temporal truncation, temporal jitter, and frame loss. The advantage is particularly pronounced under high-intensity noise and large occlusion ratios; feature map visualizations show f-LIF better preserves object features.
Energy: In graph learning tasks, f-SNN achieves significantly lower energy consumption while maintaining higher accuracy, verifying its superior energy efficiency.

Key Findings¶

Gains stem from the memory mechanism rather than parameter count: in a fair comparison, replacing only the charging module keeps parameters perfectly aligned.
\(\alpha\) serves as an additional degree of freedom to capture richer temporal patterns.

Highlights & Insights¶

Theoretical and Methodological Coherence: Starting from the observation that biological neurons are non-Markovian, the work uses fractional calculus for a rigorous mathematical grounding. Three theorems (Memory, Irreducibility, Robustness) explain "why it works" beyond just benchmarking.
Elegance of Strict Generalization: \(\alpha=1\) exactly recovers classic SNNs, and discretization reverts to Euler as \(\alpha \to 1\). The framework mathematically "brackets" the entire integer-order SNN family, minimizing adoption friction.
Plug-and-Play + Open-source Toolbox: Only the neuron kernel is replaced without moving the backbone or adding parameters. The spikeDE toolbox supports CNN/ResNet/Transformer/MLP.
Theoretically Grounded Robustness: The contrast between sublinear perturbation growth (\(t^\alpha\)) and linear growth (\(t\)) elevates "noise resistance" from an empirical phenomenon to a provable property.

Limitations & Future Work¶

Computational Overhead: Fractional neurons require convolution over history. Even with short-memory truncation (\(O(NM)\)) or FFT (\(O(N\log N)\)), it remains heavier than the \(O(N)\) complexity of first-order SNNs.
Non-SOTA Positioning: The objective is to "improve existing SNNs" rather than achieve absolute SOTA on massive datasets like ImageNet, where performance is still limited by the SNN community's total compute.
\(\alpha\) Requires Tuning: The optimal \(\alpha\) is found via hyperparameter search; an end-to-end scheme for adaptive or learnable \(\alpha\) is missing.
Neuromorphic Hardware Deployment: Whether the non-locality of power-law kernels can be efficiently implemented on event-driven hardware remains to be verified.

SNN Neuron Evolution: From IF/LIF (Stein 1967) to variants with adaptive time constants or threshold learning; this work unifies them as special cases of the fractional framework.
Fractional Neurons: While f-LIF has been studied in computational neuroscience (Teka 2014; Deng 2022) to explain frequency adaptation, this is the first systematic integration into deep SNN frameworks.
Neural f-ODE Robustness: Leveraging findings that neural f-ODEs possess tighter input-output perturbation bounds (Kang 2024c), these properties are successfully transferred to spiking networks.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First systematic introduction of fractional calculus into deep SNNs with rigorous theorems.
Experimental Thoroughness: ⭐⭐⭐⭐ Covers neuromorphic vision and graph learning across ten datasets with robustness and energy analysis; however, absolute performance on static datasets is relatively lower.
Writing Quality: ⭐⭐⭐⭐ Clear logic from bio-motivation to mathematical framework and theoretical guarantees.
Value: ⭐⭐⭐⭐ Plug-and-play, no extra parameters, and open-sourced, providing a reusable enhancement module for the SNN community.