Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models¶

Conference: AAAI 2026 arXiv: 2511.17170 Code: https://github.com/vnht/abca Area: LLM Reasoning / Hallucination Detection / Causal Inference Keywords: Selective answering, causal abstention, aspect variables, knowledge conflict detection, AIPW estimation

TL;DR¶

This paper proposes ABCA (Aspect-Based Causal Abstention), a pre-generation abstention framework that employs dual-agent debate to identify "aspect variables" (e.g., discipline, legal context, temporal frame) for activating distinct knowledge branches within LLMs. It applies the AIPW doubly robust estimator to compute causal effects and uses Centroid Angular Deviation (CAD) to detect knowledge conflicts (Type-1) or knowledge insufficiency (Type-2), achieving 91.4% accuracy on TruthfulQA and 96.4% unanswerable question identification rate—far surpassing the baseline of 44%.

Background & Motivation¶

Background: Hallucination detection and selective answering ("say I don't know when uncertain") in LLMs have attracted growing attention. Existing approaches fall into two categories: post-generation detection (Self-Consistency, SelfCheckGPT) and multi-model feedback (LLM Collaboration).

Limitations of Prior Work: (a) Post-generation methods require producing potentially harmful content before making a judgment; (b) most methods perform only coarse-grained "answerability" decisions without distinguishing "knowledge conflict" (contradictory knowledge) from "knowledge insufficiency" (absent knowledge); (c) causal analysis of abstention decisions is lacking—superficial response diversity may be confounded by latent variables such as pretraining and token frequency biases.

Key Challenge: A single LLM query may traverse multiple knowledge pathways, yet direct sampling tends to activate only high-frequency paths. A mechanism is needed to systematically activate distinct knowledge branches and compare their consistency.

Key Insight: Aspect variables are introduced as causal conditioning factors—different disciplinary perspectives, temporal frames, or legal contexts activate different parametric knowledge within the LLM. Abstention decisions are then made causally by comparing response consistency across aspects.

Core Idea: Causal conditioning via aspect variables + AIPW estimation of causal effects + CAD-based detection of knowledge conflict/insufficiency.

Method¶

Overall Architecture¶

Two stages: (1) dual-agent debate to discover valid aspect variables; (2) AIPW estimation of the causal effect per aspect → CAD-based abstention decision.

Key Designs¶

Aspect Discovery (Dual-Agent Debate):
- DAgent: Explores the model's knowledge space and proposes conditioning dimensions and aspects \(\{x_i\}\).
- CAgent: Validates whether each aspect satisfies validity criteria—dimensional consistency, temporal precedence, and factual grounding.
- Iterative debate over \(T\) rounds (default \(T=2\)), producing valid aspects and their weights \(\{w_i\}\).
AIPW Causal Effect Estimation:
- For each aspect \(x_i\), \(K\) Chain-of-Thought responses are generated and \(N\) answers are sampled.
- The doubly robust AIPW estimator is used to compute the causal effect \(\hat{\tau}(x_i)\), combining outcome regression with inverse probability weighting.
- Significance score: \(\alpha_i = w_i \cdot \hat{\tau}(x_i)\).
CAD Abstention Strategy:
- The causally weighted centroid \(\mathbf{c}\) is computed, and the angular deviation \(\theta_i\) between each aspect's response and the centroid is measured.
- Type-1 Abstention: \(\text{CAD} > \theta_{\max}\) → knowledge conflict (contradictory answers across aspects) → refuse to answer.
- Type-2 Abstention: \(1 - (\mathbf{c} \cdot \mathbf{e}_{\text{null}}) \leq \rho_{\text{null}}\) → centroid aligns with "I don't know" → knowledge insufficiency → refuse to answer.
- Otherwise → synthesize an answer from high-significance aspects.

Key Experimental Results¶

Main Results (GPT-4.1)¶

Dataset	ABCA Acc	Best Baseline	Unanswerable U-Ac
TruthfulQA	91.4%	88.1% (CFMAD)	96.4% vs 44.0%
KUQ	76.8%	74.1% (CausalAbstain)	84.6%
AVeriTeC	65.9%	62.7% (CausalAbstain)	38.5%

Informativeness Score (out of 100)¶

Method	TruthfulQA	KUQ	AVeriTeC
CausalAbstain	75.44	74.65	79.14
ABCA	85.45	79.56	86.45

Ablation Study¶

Variant	TruthfulQA Acc	U-Ac
ABCA (Full)	91.4%	96.4%
w/o aspect conditioning	83.5%	77.4%
Single agent (no debate)	87.1%	77.4%
Uniform weights	85.1%	79.8%
Lightweight (\(T=K=N=1\))	89.5%	84.5%

Key Findings¶

Aspect conditioning is the core contributor: removing it drops Acc from 91.4% to 83.5% (−7.9 pp) and U-Ac from 96.4% to 77.4% (−19 pp).
Unanswerable question identification is exceptionally strong: U-Ac of 96.4% vastly outperforms CFMAD's 44%, dominating all baselines.
Informativeness is preserved during abstention: abstained responses score 85.41/100 vs. 45–55/100 for baselines—ABCA's abstention explanations are substantially more informative.
Cross-model generalization: consistent improvements are observed on LLaMA 3.3 70B and Mistral-NeMo 12B.
Type-1/Type-2 confusion is the primary failure mode: 14.3% of conflict cases are misclassified as insufficiency, and 18.7% of insufficiency cases are misclassified as conflict.

Highlights & Insights¶

Causal conditioning via aspect variables represents a theoretically elegant innovation—rather than simple repeated sampling (Self-Consistency), it purposefully activates parametric knowledge through distinct knowledge pathways.
The AIPW doubly robust estimator addresses latent confounders (pretraining biases) within LLMs, offering stronger statistical grounding than naive majority voting.
Distinguishing Type-1 and Type-2 abstention is more fine-grained than the binary "answerable/unanswerable" dichotomy in prior work—knowledge conflict and knowledge insufficiency are fundamentally different failure modes that warrant distinct handling.

Limitations & Future Work¶

Each query requires approximately 24.9 LLM calls, incurring substantial computational overhead.
Aspect discovery depends on LLM prompting strategies, with no guarantee that discovered aspects satisfy causal criteria.
When all aspects converge on the same incorrect answer ("consistent hallucination"), ABCA cannot detect the error.
CAD assumes a shared semantic space, which may fail for knowledge domains with large ontological divergence.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ The theoretical framework combining causal abstention, aspect conditioning, and AIPW estimation is highly complete and novel.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Four datasets, seven baselines, three LLMs, comprehensive ablation, and informativeness evaluation.
Writing Quality: ⭐⭐⭐⭐⭐ Causal model definitions are rigorous, theoretical derivations are clear, and experimental analysis is thorough.
Value: ⭐⭐⭐⭐⭐ A U-Ac of 96.4% is a highly significant result with important implications for the safe deployment of LLMs.