Skip to content

L2GTX: From Local to Global Time Series Explanations

Conference: CVPR2025 (xAI-2026 / Springer proceedings)
arXiv: 2603.13065
Code: None
Area: Time Series
Keywords: explainable AI, time series classification, global explanations, model-agnostic, event-based explanations

TL;DR

L2GTX proposes a completely model-agnostic global explanation method for time series classification. By aggregating Parameterized Event Primitives (PEPs) generated by LOMATCE, it constructs class-level global explanations, maintaining stable global fidelity (\(R^2\)) across six benchmark datasets.

Background & Motivation

Key Challenge

Key Challenge: Background: 1. Deep learning models achieve high accuracy in time series classification, but understanding their class-level decision behavior remains challenging. 2. XAI methods for image and tabular data cannot be directly generalized to time series due to temporal dependencies, variable-length events, and pattern shifts across instances. 3. Most existing time series explainability methods focus on local explanations (individual predictions), while global explanation methods are scarce. 4. The few available global methods are model-specific (relying on CAM, LRP, etc.), limiting architecture-agnostic explainability. 5. Adapting LIME/SHAP to time series typically treats time steps as independent features, ignoring temporal dependencies. 6. There is a need for a local-to-global aggregation method that can select representative instances and synthesize class-level global explanations.

Method

Overall Architecture

The five-step pipeline of L2GTX is: (1) LOMATCE generates instance-level local explanations \(\rightarrow\) (2) Merge similar event clusters across instances \(\rightarrow\) (3) Compute global cluster importance \(\rightarrow\) (4) Select representative instances under budget constraints \(\rightarrow\) (5) Aggregate events to generate class-level global explanations.

Key Designs

Step 1: LOMATCE Local Attribution - For each instance \(X_i\), construct S perturbed neighborhood samples. - Extract Parameterized Event Primitives (PEPs): increasing segments, decreasing segments (start_time, duration, avg_gradient), and local maxima/minima (time, value). - Apply K-means clustering and use silhouette analysis to determine the value of K. - Train a weighted linear surrogate (ridge regression) to obtain cluster importance \(\hat{\beta}_i\). - Retain the top-n important clusters as local explanations.

Step 2: Merge Similar Clusters + Build Instance-Cluster Matrix - Execute hierarchical agglomerative clustering (using Euclidean distance) on cluster centroids of the same PEP types across instances. - User specifies the merge percentile \(p\) to control the cutoff distance \(\tau\). - Construct the instance-cluster matrix \(\mathbf{M} \in \mathbb{R}^{N \times |\mathcal{G}|}\), where entries represent the sum of importance weights of the merged global clusters.

Step 3: Global Cluster Importance - Adopt the aggregation strategy from SP-LIME: \(I_j = \sqrt{\sum_{i=1}^{N} |M_{i,j}|}\)

Step 4: Greedy Instance Selection - Given a budget \(B\), greedily select instances that cover the most high-importance uncovered clusters. - \(i^* = \arg\max_{i \notin S} \sum_j I_j \cdot \mathbf{1}\{M_{i,j} > 0 \land c_j = 0\}\) - Submodular optimization guarantees an approximate optimal coverage.

Step 5: Event Aggregation - Flatten events from selected instances into their corresponding global clusters. - Compute the mean and standard deviation of each attribute to serve as class-level descriptions. - Trend clusters: statistics on start_time and duration; Extremum clusters: statistics on time and value.

Loss & Training

  • Global Fidelity (GF) = mean of the local surrogate \(R^2\) of selected instances.

Key Experimental Results

Datasets

ECG200, GunPoint, Coffee, FordA, FordB, CBF (UCR Archive)

Global Fidelity (GF) of FCN Model

Main Results

Dataset p=25 p=50 p=75 p=95
ECG200 0.784 0.788 0.780 0.792
GunPoint 0.593 0.599 0.601 0.597
Coffee 0.683 0.678 0.678 0.678
FordA 0.674 0.672 0.673 0.672

Global Fidelity of LSTM-FCN Model

Ablation Study

Dataset p=25 p=50 p=75 p=95
ECG200 0.828 0.832 0.829 0.831
GunPoint 0.617 0.619 0.588 0.638
FordA 0.618 0.621 0.614 0.627

Key Findings

  • GF remains stable across different merge percentiles (overlapping confidence intervals), demonstrating that L2GTX effectively compresses the explanation space without sacrificing fidelity.
  • The number of global clusters monotonically decreases as the p-value increases, yet GF does not drop.
  • Event primitives (trends, extrema) provide richer semantic explanations than raw time-step importance.

Highlights & Insights

  1. Completely Model-Agnostic: Does not rely on internal model architectures, making it applicable to any black-box time series classifier.
  2. Semantically Rich Event Primitives: Describes temporal behavior using increasing/decreasing trends and local extrema, which is far more meaningful than "importance of a specific time step."
  3. Robust Compression: GF remains stable under different merge thresholds, indicating that global clusters successfully capture shared decision-relevant signals.
  4. Budget Control: Users can adjust the instance selection budget B and merge percentile p to control the granularity of explanations.
  5. Interpretable Class-Level Summaries: Generates human-readable descriptions of "what type of event occurs when."

Limitations & Future Work

  1. Only univariate time series are supported; multivariate expansion requires additional design.
  2. GF is based on \(R^2\), measuring only linear surrogate fidelity rather than directly reflecting global fidelity to the original model.
  3. The scale of the experimental datasets is small (with the largest being FordA with 4921 samples), lacking large-scale validation.
  4. The computational complexity of LOMATCE local explanations grows with the number of neighborhood samples, raising scalability concerns.
  5. The expressiveness of event primitives is limited to predefined types (trends, extrema), potentially missing features like frequency.
  • Conceptually related to SP-LIME in representative instance selection strategy, but L2GTX introduces time series-specific event aggregation.
  • Similar in philosophy to GLocalX (global explanation aggregation for tabular data) but tailored for time series.
  • Event primitives are derived from Kadous's parameterized event framework, aligning well with human intuition for time series.
  • Inspires future research on global explainability for multivariate time series and other sequential data.

Rating

  • Novelty: ⭐⭐⭐⭐ (Reasonable formulation but incremental innovation)
  • Experimental Thoroughness: ⭐⭐⭐⭐ (Small datasets, limited to UCR benchmarks)
  • Writing Quality: ⭐⭐⭐⭐⭐ (Clear logic and complete algorithmic description)
  • Value: ⭐⭐⭐⭐ (Fills the gap in global explanations for time series, but with limited application scenarios)