ACL2025 Knowledge Editing AI paper notes paper summaries LLM Adversarial Robustness Question Answering Reasoning

✏️ Knowledge Editing¶

💬 ACL2025 · 19 paper notes

📌 Same area in other venues: 📷 CVPR2026 (2) · 🔬 ICLR2026 (15) · 💬 ACL2026 (10) · 🧪 ICML2026 (8) · 🤖 AAAI2026 (4) · 🧠 NeurIPS2025 (6)

🔥 Top topics: LLM ×3 · Adversarial Robustness ×2

A General Knowledge Injection Framework for ICD Coding: This paper proposes GKI-ICD, a general knowledge injection framework. By employing guideline synthesis and multi-task learning mechanisms, it simultaneously integrates three types of ICD knowledge—Description, Synonym, and Hierarchy—without requiring extra network modules, achieving SOTA performance on the MIMIC-III benchmark.
ToxEdit: Adaptive Detoxification Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing: ToxEdit is proposed—a toxicity-aware knowledge editing method that detects harmful hidden states in the early layers of LLM forward propagation using an SVM classifier. Through a routing mechanism, harmful inputs are directed to edited FFN replicas, while harmless inputs follow the original FFN. This achieves nearly 98% detoxification success rate and 95% instruction-following retention (DL metric) on LLaMA3-8B/LLaMA2-7B/Mistral-7B, resolving the key challenge of "detoxification vs. over-editing" in knowledge editing detoxification.
BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning: Proposes BMIKE-53, a cross-lingual benchmark covering 53 languages and integrating three knowledge editing datasets (zsRE, CounterFact, and WikiFactDiff). It systematically evaluates in-context knowledge editing methods from zero-shot to 8-shot settings, revealing that writing systems (Latin vs. non-Latin) are more decisive than language families for cross-lingual editing performance, and that metric-specific exemplar strategies significantly outperform hybrid configurations.
ChainEdit: Propagating Ripple Effects in LLM Knowledge Editing through Logical Rule-Guided Chains: The ChainEdit framework is proposed, which aligns logical rules mined from knowledge graphs with the intrinsic logical reasoning capabilities of LLMs to achieve chain-based updates during knowledge editing, improving logical generalization accuracy from ~20% to 58-65%.
CKnowEdit: A New Chinese Knowledge Editing Dataset for Linguistics, Facts, and Logic Error Correction in LLMs: Constructs CKnowEdit, the first knowledge editing dataset oriented towards Chinese linguistic characteristics. It covers three major categories (linguistics (pinyin/ancient poetry/classical Chinese/idioms/proverbs), facts (history and geography), and logical traps (homophones/reasoning/wordplay)) with a total of 1,854 samples. It systematically evaluates the performance of five mainstream knowledge editing methods on four Chinese LLMs, revealing unique editing challenges in Chinese.
CompKe: Complex Question Answering under Knowledge Editing: Proposes the CompKe benchmark—containing 11,924 complex questions—to evaluate the performance of knowledge editing methods in complex reasoning scenarios involving one-to-many relations, logical operations (intersection/union), and condition confirmation, revealing the significant deficiencies of existing methods in complex question answering.
Context-Robust Knowledge Editing for Language Models: This work identifies that existing knowledge editing methods significantly fail when prefix contexts are present (with editing success rates dropping from 90.9% to 69.1%). It introduces the CHED benchmark to evaluate context robustness and designs CoRE, a method that enhances the context robustness of editing through diversified prefix contexts and cross-prefix hidden state variance regularization, significantly narrowing the performance gap between settings with and without context while maintaining general model capabilities.
DocMEdit: Towards Document-Level Model Editing: This paper proposes the document-level model editing task for the first time and constructs the DocMEdit benchmark containing 37,990 data items and 105,652 editing facts, revealing the severe shortcomings of existing editing methods in long-context, multi-fact parallel editing scenarios.
Efficient Knowledge Editing via Minimal Precomputation: Demonstrates that the precomputation step (caching 44 million hidden vectors) for knowledge editing methods like MEMIT/ROME/EMMET can be reduced to 2-10 times the theoretical minimum (less than 0.3% of the original size), reducing precomputation time from dozens of hours to minutes with virtually no loss in editing performance.
Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning: Proposes a four-level knowledge injection framework (Memorization → Retrieval → Reasoning → Association) and builds the DeepKnowledge synthetic evaluation platform. It systematically reveals the key factors for each level of knowledge injection: repetitive learning for memorization, diverse expressions for retrieval, and explicit reasoning patterns for deep reasoning and association, providing a complete method-level mapping for LLM knowledge updates.
Mitigating Negative Interference in Multilingual Sequential Knowledge Editing through Null-Space Constraints: This paper proposes the LangEdit framework, which projects parameter updates of each language onto the null space of previously edited languages to achieve mathematical isolation between language updates in multilingual sequential knowledge editing, effectively mitigating negative interference and preserving multilingual generalization capability.
Neuron-Level Sequential Editing for Large Language Models: This work proposes the NSE method for sequential model editing in LLMs, which prevents model collapse through weights rewinding, mitigates model forgetting via activation-based neuron-level selective weight updates, and improves the success rate of large-scale knowledge updates through iterative multi-layer editing.
REP: Keys to Robust Edits — From Theoretical Insights to Practical Advances: Reveals a fundamental flaw in the semantic keys of locate-and-edit knowledge editing methods—internal representations cannot simultaneously satisfy robustness and specificity. The paper proposes the REP module to disentangle editing keys via contrastive learning, achieving up to a 66.4% improvement in robustness tests.
Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing: This paper defines the phenomenon of "superficial editing"—where models modified by knowledge editing algorithms perform well under standard prompts but revert to the original knowledge under hand-crafted adversarial probes—and reveals through mechanistic analysis that the residual stream in early layers and specific attention heads in late layers are two crucial factors causing this phenomenon.
SAKE: Steering Activations for Knowledge Editing: SAKE proposes modeling knowledge editing as a distribution mapping problem in the activation space. By constructing source and target activation distributions through generating a set of paraphrased and logically entailed prompts for the edited facts, and then replacing the activation vectors with a linear mapping from optimal transport, SAKE achieves more robust fact editing than methods like ROME/MEMIT, significantly leading in logical entailment generalization and context robustness.
ScEdit: Script-based Assessment of Knowledge Editing: The authors propose ScEdit, a script-based evaluation benchmark for knowledge editing, which extends traditional "What"-style factual recall evaluation to "How"-style procedural reasoning. It introduces a two-tier evaluation system at both token and text levels, revealing significant limitations of existing knowledge editing methods in practical application scenarios.
Structure-aware Domain Knowledge Injection for Large Language Models: This paper proposes StructTuning, which automatically extracts the taxonomic knowledge structure of the training corpus and designs a two-stage strategy: Structure-aware Continual Pre-training (SCPT) and Structure-aware Supervised Fine-tuning (SSFT). It achieves 100% of the domain knowledge injection performance of traditional methods while using only 5% of the data.
The Mirage of Model Editing: Revisiting Evaluation in the Wild: This paper reveals systematic flaws in the evaluation practices of the model editing field—the near-perfect success rates (~96.8%) reported by prior methods plunge to 38.5% in real-world application scenarios. The root cause is the leakage of ground-truth information through teacher forcing during testing. The authors propose the QAEdit benchmark and the WILD evaluation framework to foster more reliable evaluations.
Towards a Principled Evaluation of Knowledge Editors: This paper systematically reveals that different scoring methods (argmax, multiple-choice, generation match) and different edit batch sizes in knowledge editing evaluation lead to reversals in knowledge editor rankings, and finds that string match-based evaluations are prone to false positives through human evaluation.