ACL 2026 Post-training quantization signal degradation computation collapse mechanistic interpretability causal tracing knowledge recall PTQ

From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization¶

Conference: ACL 2026 arXiv: 2604.19884 Code: None Area: Model Quantization / Interpretability Keywords: Post-training quantization, signal degradation, computation collapse, mechanistic interpretability, causal tracing, knowledge recall, PTQ

TL;DR¶

Through systematic mechanistic interpretability analysis, this paper reveals that LLM quantization exhibits two qualitatively distinct failure modes: 4-bit Signal Degradation (computational patterns remain intact but precision is impaired, amenable to local repair) and 2-bit Computation Collapse (functional destruction of critical components, requiring structural reconstruction).

Background & Motivation¶

Background: Post-training quantization (PTQ) is a key technique for efficient LLM deployment. 4-bit quantization is widely regarded as the optimal balance between accuracy and compression, while 2-bit quantization typically triggers a catastrophic "performance cliff"—accuracy plummeting to near zero.

Limitations of Prior Work: Existing research concentrates on three directions: (1) macro-level evaluation (measuring the degree of performance degradation); (2) algorithmic improvements (outlier suppression, rotation matrices, and other numerical optimizations); and (3) preliminary mechanistic exploration (layer/component sensitivity analysis). All three share the limitation of treating quantization damage as a "numerical problem" without probing why internal model mechanisms fail.

Key Challenge: Is the catastrophic failure at 2-bit a quantitative accumulation of 4-bit degradation, or does it represent a qualitative transition? If qualitative, it implies that all current numerically-oriented repair strategies are fundamentally misguided for 2-bit quantization.

Goal: To reveal intrinsic mechanistic differences underlying quantization failures through systematic mechanistic interpretability analysis (layer-wise information flow, causal pathways, component functionality, and representation space), and to validate that different failure modes correspond to different repair strategies.

Key Insight: The authors draw an analogy to signal processing—is the signal weakened by noise (degradation), or is the computation pipeline itself broken (collapse)?

Core Idea: The failures of 4-bit and 2-bit quantization differ not in degree but in kind. Signal degradation can be recovered through targeted training-free repair, whereas computation collapse requires structural reconstruction (e.g., fine-tuning)—a distinction that constitutes the strongest evidence for differentiating the two modes.

Method¶

Overall Architecture: Llama-3.1-8B serves as the primary subject; FP16/4-bit/2-bit internal behaviors are systematically compared on a factual knowledge recall task (Pararel). Four layers of analysis establish and validate the hypotheses: macro phenomena → layer-wise probing → causal analysis → component/representation verification → mechanism-guided intervention.

Key Designs:

Multi-level Knowledge Signal Tracing
Function: Track the existence and causal transmission integrity of knowledge signals within the model.
Mechanism: Logit Lens is used to project hidden states layer by layer into the vocabulary space, observing changes in the probability/rank of the correct token. Under 4-bit, the signal appears in middle-to-late layers but with reduced strength (degradation); under 2-bit, the signal remains near zero throughout (absence). Cross-model causal activation patching further validates this: injecting FP16 "clean" activations into critical positions (the last subject token) of the quantized model restores performance at 4-bit but produces no response at 2-bit.
Design Motivation: Distinguishing "signal weakened" from "signal never generated" is the core evidence for establishing the two-mode hypothesis.
Component-level Functional Diagnosis (Attention + FFN Key-Value Memory)
Function: Localize which specific components fail and characterize their failure modes.
Mechanism: For attention, normalized entropy (global concentration) and JSD divergence (focal deviation) are used. For FFN, gate sign flip rate (SFR; >30% indicates severe instability), Top-1% activated neuron Jaccard overlap (≈0.1 indicates complete activation misalignment), and output cosine similarity (≈0 indicates complete semantic direction deviation) are employed. Under 2-bit, all metrics indicate functional component collapse.
Design Motivation: Attributing macro-level "signal absence" to specific component failures confirms whether the issue is precision loss or functional breakdown.
Mechanism-aware Two-stage Repair vs. System Irreversibility Validation
Function: Validate that the two failure modes exhibit fundamentally different repairability.
Mechanism: For 4-bit, a "source protection + signal recovery" strategy is designed: protecting the first few layers (Llama/Mistral use 8-bit for the first 2 layers, ~4.25 avg bits; Qwen/Gemma use kurtosis-based selection, ~4.1 avg bits) plus peak signal amplification (\(\alpha\)-fold logit scaling). The same strategy and EORA low-rank compensation both prove ineffective at 2-bit. A "domino experiment" shows that quantizing only the first 2 layers causes accuracy to collapse from 100% to 41.65%.
Design Motivation: The difference in repairability is the most direct and compelling practical evidence distinguishing the two failure modes.

Key Experimental Results¶

4-bit Repair Experiments (Accuracy on Failure Subset):

Model	Baseline (4-bit)	+Basic Repair	+Signal Amplification (Final)
Llama3.1-8B	0.00%	67.91%	75.19% (\(\alpha\)=3)
Mistral-7B	0.00%	66.86%	81.26% (\(\alpha\)=9)
Qwen3-8B	0.00%	40.24%	79.88% (\(\alpha\)=7)
Gemma2-9B	0.00%	33.85%	64.08% (\(\alpha\)=2)

2-bit "Domino Effect" (Llama3.1-8B):

Quantized Layers	Robust Subset	Failure Subset
None (FP16)	100.00%	100.00%
Layer 0	65.47%	15.03%
Layers 0–1	41.65%	5.29%
Layers 0–5	2.51%	0.38%

Representation Space Structure Analysis: - 4-bit: CKA maintains a clear diagonal structure; activation subspace similarity to FP16 >0.8 - 2-bit: CKA is nearly entirely dark (structural collapse); activation subspace similarity ≈0 - 4-bit error subspace alignment with signal ≈0.3 (resembles random noise) - 2-bit error subspace alignment with signal ≈0.8 (directly interferes with principal features)

Key Findings: - At 4-bit, the correct answer rank decreases (correct answer remains in Top-5); at 2-bit, the rank collapses (drops to thousands, equivalent to random guessing). - Architecture-dependent degradation patterns: Llama/Mistral exhibit "early-layer representation bottleneck," while Qwen/Gemma exhibit "uniform degradation." - 2-bit models fail to correctly process high-precision signal inputs—the components themselves have ceased to function. - The distinction between the two failure modes is consistent across both GPTQ and AWQ quantization methods.

Highlights & Insights¶

Framework Value of Qualitative Distinction: This work is the first to systematically demonstrate that 4-bit and 2-bit failures are not different degrees along the same continuum, but two fundamentally distinct failure modes.
Complete Closed Loop from Diagnosis to Repair: Mechanistic analysis directly guides repair strategy design, and the differential effectiveness of repairs reciprocally validates the diagnosis.
Compelling "Domino Experiment": Quantizing only the first 2 layers at 2-bit causes catastrophic collapse, and 30 subsequent FP16 layers cannot recover performance—vividly demonstrating the irreversibility of computation collapse.
Insightful Error Direction Analysis: The high alignment of 2-bit quantization error with the signal subspace implies that the noise is not random but systematically destroys the model's core features.

Limitations & Future Work¶

The study focuses on weight-only quantization; failure modes of activation quantization remain to be investigated.
Evaluation is anchored to factual recall tasks; performance on complex reasoning tasks warrants further verification.
The repair strategies incur additional precision overhead (~4.1–4.25 avg bits), and practical efficiency requires further optimization.
The boundary between the two modes (3-bit behavior) deserves deeper investigation.
The failure mode demarcation point may differ across model architectures.

GPTQ (Frantar et al., 2023): The most widely used weight-only PTQ method and the primary quantization baseline in this work.
Causal Tracing (Meng et al., 2022): A knowledge localization method extended here into cross-model repair experiments.
Logit Lens (nostalgebraist, 2020): An intermediate-layer decoding tool.
SpQR (Dettmers et al., 2023): A mixed-precision method that resonates with the source protection strategy proposed in this work.
Insights: Quantization research should move beyond numerical optimization; mechanistic understanding is essential for breaking through performance bottlenecks. Practical 2-bit quantization requires a paradigm shift from "compensation" to "reconstruction."

Rating¶

Novelty: ★★★★★ — The systematic distinction and validation of two failure modes constitutes a novel and significant contribution.
Experimental Thoroughness: ★★★★★ — Four models, multi-level analysis, and multi-metric validation yield a complete evidence chain.
Writing Quality: ★★★★★ — The narrative progresses clearly from phenomena → hypotheses → validation → intervention.
Value: ★★★★☆ — Provides an important diagnostic framework and mechanistic insights for quantization research.