KTCF: Actionable Recourse in Knowledge Tracing via Counterfactual Explanations for Education¶
Conference: AAAI2026
arXiv: 2601.09156
Code: To be confirmed
Area: Causal Inference
Keywords: Knowledge Tracing, Counterfactual Explanation, XAI, Actionable Recourse, Education
TL;DR¶
This paper proposes KTCF, a counterfactual explanation generation method for Knowledge Tracing (KT) that leverages inter-concept relationships to produce sparse and actionable counterfactuals, subsequently post-processed into sequentially ordered instructional recommendations. KTCF comprehensively outperforms baseline methods across validity, sparsity, and actionability metrics.
Background & Motivation¶
- Deep learning-based Knowledge Tracing (KT) is widely applied in education to model student knowledge states and predict future performance.
- The EU AI Act classifies educational AI as high-risk, requiring interpretability; the U.S. Department of Education likewise emphasizes preserving human decision-making authority.
- Existing KT interpretability research primarily addresses what the model attends to (attention heatmaps) or whether it accurately captures learning processes (psychometric parameters).
- Methods addressing why and how questions—i.e., "why did the model produce this prediction" and "what should a student do to change the outcome"—remain largely unexplored.
- Counterfactual explanations are inherently causal and local, and are readily comprehensible to non-specialist educational stakeholders (students, teachers, parents), yet they have not been adequately explored in the KT domain.
Core Problem¶
Given a student's learning history \(X^{\text{orig}} = [(kc_1, r_1), \ldots, (kc_t, r_t)]\) and a trained KT model \(f\), when the model predicts that the student will answer the target knowledge concept \(kc_{\text{target}}\) incorrectly, the goal is to find a minimally modified counterfactual response sequence \(R^{\text{cf}}\) that flips the prediction to correct, while satisfying:
- Intervention sparsity: the number of modifications is minimized.
- Actionability: only incorrect-to-correct flips are permitted (not the reverse).
- KC-relation consistency: modified knowledge concepts are related to the target concept in the knowledge graph.
- Sequential output: explanations are presented as ordered instructional steps that minimize the student's total learning burden.
Method¶
Counterfactual Explanation Generation (Algorithm 1)¶
KTCF formulates counterfactual generation as an optimization problem with a three-term loss:
- Prediction loss \(\mathcal{L}_{\text{pred}}\): binary cross-entropy between the KT model's output on the counterfactual input and the target probability of 1.0, driving prediction flipping.
- Sparsity loss \(\mathcal{L}_{\text{spar}}\): Hamming distance between the original and counterfactual responses, encouraging minimal modification.
- KC loss \(\mathcal{L}_{\text{kc}}\): penalizes modifications to KCs that are distant from the target KC in the predefined undirected KC relation graph \(G_{\text{kc}}\), ensuring structural relevance.
Stochastic optimization is performed with the Adam optimizer. After each update step, an actionability mask \(\mathbf{m}\) is applied as a projection to ensure that only originally incorrect responses can be modified, achieving 100% actionability.
Initialization Strategies¶
Five initialization strategies are compared: Gaussian noise (-rn), random binary (-rand), soft relaxation (-sr), convex combination (-cc), and Gumbel-Sigmoid relaxation (-gs). Experiments show that soft relaxation-based initializations (-rn, -sr, -gs) achieve the best performance.
Post-processing: Instructional Sequence Generation (Algorithm 2)¶
Counterfactual explanations are converted into an executable sequential learning path for students:
- Extract the set of all modified KCs \(\overline{V}^{\text{CF}}\) from the counterfactual.
- Compute pairwise shortest-path distances between all KCs using Dijkstra's algorithm on the KC relation graph.
- Construct a complete subgraph over these KCs with edge weights equal to shortest-path distances.
- Apply a greedy strategy starting from the target KC to solve the Traveling Salesman Path Problem (TSPP), then reverse the path.
- The resulting sequence specifies the recommended order in which the student should review KCs, minimizing total learning burden (total path distance).
Key Experimental Results¶
Datasets and Setup¶
- Dataset: XES3G5M, containing 5,549,635 interaction records from 18,066 students (mathematics domain).
- KC relation graph: 1,175 nodes, 1,304 edges.
- KT model: DKT (LSTM), test AUC 0.8226, accuracy 0.8253.
- Evaluation: 200 instances randomly sampled from students with error rates above 45%.
Main Results (Table 1)¶
| Method | Validity↑ | Sparsity↓ | Actionability↓ |
|---|---|---|---|
| Wachter-rand | 0.725 | 67.36 | 40.53 |
| DiCE-rand | 0.880 | 75.59 | 35.32 |
| KTCF-rn | 0.930 | 49.85 | 0.000 |
| KTCF-gs | 0.920 | 49.92 | 0.000 |
- KTCF-rn improves validity by 28.3% and reduces sparsity by 26.0% over Wachter; it improves validity by 5.7% and reduces sparsity by 34.0% over DiCE.
- All KTCF variants achieve an actionability score of 0 (fully eliminating non-actionable modifications), whereas baseline methods contain substantial non-actionable suggestions.
- KTCF-gs is the fastest variant (2.2s), 41.9% faster than Wachter.
Qualitative Analysis (Table 2)¶
Using "distinguishing leap years from non-leap years" as the target KC:
- KTCF recommends only 5 KCs (modular arithmetic, calendar cycles, etc.) with a total path distance of 26.
- Wachter recommends 14 KCs (including irrelevant concepts such as tree diagrams and decimal units) with a total path distance of 66.
- DiCE recommends 20 KCs with a total path distance of 79.
Highlights & Insights¶
- First systematic application of counterfactual XAI to knowledge tracing, providing a complete conceptualization framework (explanandum/explanans).
- Actionability guarantee: the masking mechanism achieves 100% actionability, ensuring recommendations exclusively involve practicing incorrect items to mastery and never suggest deliberate errors.
- Educational theory grounding: the method is connected to Bloom's mastery learning theory and the 2-Sigma problem, lending it pedagogical theoretical significance.
- KC-relation constraints: embedding KC graph structure into the loss function ensures that recommended modifications are coherent within the knowledge topology.
- Post-processing design: the use of TSPP for generating optimal learning paths elegantly minimizes the student's total learning burden.
Limitations & Future Work¶
- Validation is limited to DKT (LSTM); Transformer-based KT models (e.g., AKT, SAINT) have not been tested, leaving generalizability unverified.
- The KC relation graph depends on predefined structures and is not applicable to datasets lacking such information.
- Counterfactual generation over binary responses remains sensitive to initialization strategies; although the KC loss improves robustness, the underlying issue is not fundamentally resolved.
- The greedy TSPP post-processing yields a suboptimal solution; path quality has room for improvement.
- No real user study has been conducted to validate actual instructional effectiveness (only qualitative case analysis is provided).
- Evaluation is based on only 200 instances, which is a relatively small scale.
Related Work & Insights¶
- vs. Wachter/DiCE: these general-purpose counterfactual methods do not account for KC relationships or educational actionability constraints, resulting in redundant suggestions and non-actionable modifications. KTCF addresses both issues through the KC loss and actionability mask.
- vs. attention heatmap explanations (AKT, EKT, etc.): these answer only "where did the model attend" and cannot provide actionable instructional recommendations.
- vs. psychometric parameter explanations (IRT-based): difficulty and discrimination parameters support diagnosis but similarly provide no action guidance for improvement.
- vs. SHAP/LRP post-hoc explanations (Lu et al. 2024): these explain feature contribution directions but do not generate counterfactual guidance; Lu et al.'s user study demonstrates that explanations improve trust, and KTCF advances further by providing actionable steps.
- The post-processing idea of TSP-based path optimization generalizes to other scenarios requiring ranked recommendations, such as learning path planning in recommender systems.
- The KC-relation graph constraint is transferable to domains such as medical diagnosis, where a disease relation graph could constrain the plausibility of counterfactual suggestions.
- The actionability mask is a general technique applicable to any counterfactual generation problem with unidirectional actionability constraints.
- A promising future direction is combining KTCF with LLMs to convert counterfactual explanations into natural language instructional feedback.
Rating¶
- Novelty: ⭐⭐⭐⭐ (first systematic application of counterfactual XAI to KT with a complete conceptual framework)
- Experimental Thoroughness: ⭐⭐⭐ (ablation studies are sufficient, but evaluation scale is small and user studies are absent)
- Writing Quality: ⭐⭐⭐⭐ (educational theory and technical methodology are naturally integrated; qualitative analysis is intuitive)
- Value: ⭐⭐⭐⭐ (meaningful practical contribution to interpretability in educational AI)