Skip to content

KAN-AD: Time Series Anomaly Detection with Kolmogorov-Arnold Networks

Conference: ICML 2025
arXiv: 2411.00278
Code: None
Area: Time Series
Keywords: Time Series Anomaly Detection, KAN, Kolmogorov-Arnold Networks, B-spline, Fourier Expansion

TL;DR

KAN-AD reformulates time series anomaly detection as approximating sequences using smooth univariate functions. By replacing B-splines in KAN with truncated Fourier expansion to avoid local perturbation sensitivity, it improves detection accuracy by an average of 15% across four benchmarks with fewer than 1000 parameters.

Background & Motivation

Background: Time series anomaly detection (TSAD) is a core capability for real-time monitoring in cloud services and web systems. Typical approaches rely on forecasting models (predicting the next step, where large deviations indicate anomalies).

Limitations of Prior Work: (a) Forecasting models tend to overfit small fluctuations and are overly sensitive to local perturbations; (b) Effective TSAD should focus on global smooth patterns of "normal" behavior rather than fine-grained jitters; (c) Direct application of KANs (Kolmogorov-Arnold Networks), despite their theoretical capacity to approximate with univariate functions, suffers from B-splines' local nature, making them highly sensitive to perturbations.

Key Challenge: Precise fitting vs. robust detection—excessively precise fitting degrades anomaly detection performance.

Key Insight: Starting from the Kolmogorov-Arnold representation theorem, this work models time series as a composition of smooth univariate functions.

Core Idea: Replace B-splines with truncated Fourier expansion as the activation functions in KAN. The global nature of Fourier bases naturally mitigates local perturbations, while a lightweight learning mechanism emphasizes global patterns.

Method

Overall Architecture

Input: Time series window → KAN-AD (Fourier bases + lightweight learning mechanism) → Predict next-step value → Calculate prediction error → Anomalies detected if a threshold is exceeded.

Key Designs

  1. Replacing B-spline KAN with Fourier KAN:

    • Kolmogorov-Arnold representation theorem: \(f(\mathbf{x}) = \sum_{q=0}^{2n} \Phi_q(\sum_{p=1}^n \phi_{q,p}(x_p))\)
    • Standard KAN parameterizes \(\phi_{q,p}\) with B-splines, which are local basis functions and thus sensitive to small input perturbations.
    • KAN-AD employs truncated Fourier series: \(\phi(x) = a_0 + \sum_{k=1}^K (a_k \cos(kx) + b_k \sin(kx))\)
    • The smoothness of each univariate function is controlled by the Fourier truncation order \(K\).
    • Design Motivation: Fourier basis functions are global; changing a single coefficient affects the entire curve, naturally providing robustness against local noise.
  2. Lightweight Learning Mechanism:

    • Emphasizes low-frequency information of global patterns.
    • Restricts network capacity to avoid overfitting high-frequency noise.
    • Requires very few parameters (<1000 trainable parameters).
    • Design Motivation: Extremely small models possess inherent regularization effects, forcing the network to capture only the most prominent patterns.
  3. Anomaly Detection Strategy:

    • Training on normal data: Fitting the smooth patterns of "normal" behavior.
    • Testing: Anomalous points deviate from the smooth patterns, yielding large prediction errors, which are flagged as anomalies.
    • Design Motivation: The model's inability to reconstruct anomalous patterns yields a high-error signal.

Loss & Training

  • Prediction loss: Mean Squared Error (MSE) is used for next-step forecasting.
  • Trained exclusively on normal data (unsupervised anomaly detection paradigm).

Key Experimental Results

Main Results

Benchmark Metric KAN-AD Prev. SOTA Gain
Benchmark 1 F1 / AUC Best - Significant
Benchmark 2 F1 / AUC Best - Significant
Benchmark 3 F1 / AUC Best - Peak >27%
Benchmark 4 F1 / AUC Best - Significant
Ave. of 4 Benchmarks Detection Accuracy - - +15%

Ablation Study

Configuration Key Metric Description
Fourier KAN (KAN-AD) Best Global basis functions resist local perturbations
B-spline KAN (Original KAN) Inferior Local basis functions are sensitive to noise
Different truncation orders \(K\) Performance curve Moderate \(K\) is optimal, while excessive \(K\) leads to overfitting
Inference Speed Time elapsed 50% faster than original KAN

Key Findings

  • Fourier basis functions significantly outperform B-splines on the TSAD task.
  • SOTA performance is achieved with fewer than 1000 parameters—exhibiting extreme parameter efficiency.
  • Inference speed is 50% faster than the original KAN, thanks to the efficient implementation of the Fourier transform.
  • The Fourier truncation order \(K\) acts like frequency bandwidth, controlling the trade-off between fitting precision and robustness.

Highlights & Insights

  • Rethinking the Essence of TSAD: Anomaly detection ≠ precise forecasting; rather, it is the identification of deviations from smooth patterns.
  • Practical Application of KAN: Transforms KAN from a theoretical framework into a practical, lightweight model.
  • Extreme Parameter Efficiency: Challenging ten-thousand-parameter SOTA models with <1000 parameters hints at the intrinsic low dimensionality of TSAD.
  • Engineering Value: Compact models + fast inference = ideal for real-time monitoring scenarios.

Limitations & Future Work

  • Fourier expansion assumes a certain degree of periodicity or regularity, which may not hold for completely non-periodic time series.
  • The composition of univariate functions might be insufficient to capture complex dependencies in high-dimensional time series.
  • The truncation order \(K\) currently requires manual selection.
  • Evaluation is limited to four standard benchmarks; the complexity of practical production environments is not yet fully covered.
  • The original KAN (Liu et al. 2024) introduced the Kolmogorov-Arnold network architecture.
  • Classic TSAD methods: LSTM-based, Transformer-based, and Graph Neural Network-based approaches.
  • Statistical approaches, such as STL decomposition, also utilize smoothness assumptions.
  • Insight: A strong inductive bias is more critical than large model scales; what TSAD requires is smoothness rather than massive parameter capacity.

Rating

  • Novelty: ⭐⭐⭐⭐ The application of KAN + Fourier in TSAD is novel and theoretically grounded.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Four benchmarks + comprehensive ablation studies + efficiency analysis.
  • Writing Quality: ⭐⭐⭐⭐ Motivating arguments are highly convincing.
  • Value: ⭐⭐⭐⭐⭐ Extreme parameter efficiency and inference speed make it highly suitable for real-world deployment.