Using Shapley Interactions to Understand How Models Use Structure¶
Conference: ACL 2025
arXiv: 2403.13106
Code: None
Area: Others
Keywords: Shapley interaction, syntactic structure, multi-word expression, speech model, non-linear representation
TL;DR¶
Using the Shapley Taylor Interaction Index (STII) to systematically analyze cross-modality (text and speech) how language models encode syntactic structure, non-compositional semantics, and phonetic coarticulation through non-linear interactions, it is found that autoregressive models significantly outperform masked models in syntactic encoding.
Background & Motivation¶
Background¶
Background: Background: Feature attribution methods like Shapley values are important tools for understanding neural networks, but they assume that features are independent and linearly additive, ignoring non-linear interactions. Limitations of Prior Work: Existing work on Shapley interactions is limited to older architectures like LSTM and simple classification tasks, failing to extend to modern Transformers and multimodal scenarios. Key Challenge: Language data is highly structured, and linear attribution cannot reveal how models encode dependencies within the structures. Goal: To verify whether STII can capture the model's encoding of linguistic structures across modalities. Key Insight: Associating and analyzing STII with three known linguistic structures (syntax, semantic compositionality, and phonetic coarticulation). Core Idea: Structurally closely associated features exhibit stronger non-linear interactions.
Method¶
Overall Architecture¶
STII is used to measure the intensity of non-linear interactions between pairwise features. Under the condition of controlling for positional distance, the relationships between STII and syntactic distance, multi-word expression attribution, phoneme types, etc., are examined.
Key Designs¶
-
STII Computation and Position Control:
- Function: Computes the Shapley Taylor Interaction Index (STII) for pairwise features and controls for positional effects.
- Mechanism: \(\text{STII}_{A,B} = \frac{\| \phi(\emptyset) - \phi(A) - \phi(B) + \phi(A,B) \|_2}{\| \phi(\emptyset) \|_2}\), approximated using Monte Carlo permutation sampling. Stratified control is applied by defining interaction pair distance \(d_i\) and prediction distance \(d_p\).
- Design Motivation: STII measures the part of the joint effect that exceeds the sum of independent effects—which is precisely the non-linear structural encoding signal. Stratified control eliminates the confounding of positional effects.
-
Three-Level Structural Association Analysis:
- Function: Associates STII with syntactic structure, non-compositional semantics (MWE), and phonetic coarticulation, respectively.
- Mechanism: (a) Syntax: spaCy dependency tree + Spearman correlation; (b) Semantics: AMALGrAM labeled strong/weak MWEs, comparing STII differences inside and outside MWEs; (c) Speech: Wav2Vec 2.0 + Montreal Forced Aligner, comparing STII at consonant-vowel vs. consonant-consonant boundaries.
- Design Motivation: Verification across all three levels proves the value of STII as a general interpretability tool.
-
Autoregressive vs. Masked Model Comparison:
- Function: Compares GPT-2 and BERT-base under the same experiments.
- Mechanism: Compares the sensitivity of the two training objectives to syntax under identical STII analysis.
- Design Motivation: To verify whether training objectives lead models to encode syntactic relations in different ways.
Loss & Training¶
This is an analytical study that directly analyzes pre-trained models without involving training. Inputs are truncated to 20 tokens, and softmax is applied to the logit outputs to ensure comparability.
Key Experimental Results¶
Main Results¶
| Experiment | GPT-2 (Autoregressive) | BERT (Masked) |
|---|---|---|
| Positional Effect | STII decreases monotonically with distance ✓ | STII decreases monotonically with distance ✓ |
| Syntactic Distance vs STII | All significant cells are negatively correlated | Inconsistent mix of positive and negative |
| Strong MWE Interaction Enhancement | Strong MWE > Weak MWE > General pair ✓ | Strong MWE > Weak MWE > General pair ✓ |
Speech Model (Wav2Vec 2.0):
| Comparison | Mean STII |
|---|---|
| Consonant-Vowel Boundary | Significantly higher |
| Consonant-Consonant Boundary | Lower |
| High Sonority Consonant | Higher (similar to Vowel) |
| Low Sonority Consonant | Lower |
Ablation Study¶
Positional effect baseline:
| Distance Type | GPT-2 | BERT |
|---|---|---|
| \(d_i\) ↑ | STII monotonic ↓ | STII monotonic ↓ |
| \(d_p\) ↑ | STII sharp ↓ | STII sharp ↓ |
Key Findings¶
- Autoregressive vs. Masked Difference: In GPT-2, syntactic distance is consistently negatively correlated with STII, whereas BERT is inconsistent—autoregressive training objectives are more inclined to learn syntax.
- Non-compositional Semantics Reflected as Non-linear Interactions: Strong MWEs (e.g., kick the bucket) have stronger interactions than weak MWEs—and this holds true across both models.
- Speech Models Capture Coarticulation: Consonant-vowel interactions are stronger than consonant-consonant interactions, and consonants with higher sonority have higher STII—perfectly aligning with phonetic theories.
Highlights & Insights¶
- Unified Cross-Modality Analysis: Text + speech, generation + recognition—STII serves as a general interpretability tool.
- Revealing Deep Impacts of Training Objectives on Structural Encoding—not differences in performance, but differences in encoding mechanisms.
- Phonetic Experiments Utilizing the IPA Consonant Chart as Heatmap Layout—visually presenting phonetic rules.
Limitations & Future Work¶
- Only small models like GPT-2/BERT-base are investigated; findings may not generalize to large language models.
- Only pairwise interactions are explored, without investigating the hierarchical structures corresponding to higher-order interactions.
- Correlation rather than causal analysis.
Related Work & Insights¶
- vs. Structural Probe: Detects linearly extractable structural information; STII detects non-linear encoding—complementary.
- vs. Saphra & Lopez (2020): This work extends to Transformer + multiple linguistic structures + speech.
- Insights: Non-linear processing of models is the core missing piece of interpretability—linear analyses only scratch the surface.
Rating¶
- Novelty: ⭐⭐⭐⭐ First to systematically associate Shapley interactions with multiple linguistic structures.
- Experimental Thoroughness: ⭐⭐⭐ In-depth analysis but with limited model scale.
- Writing Quality: ⭐⭐⭐⭐ Clear theoretical framework and clever experimental design.
- Value: ⭐⭐⭐⭐ Provides a new methodological perspective for NLP interpretability.