Aligning Language Models with Real-time Knowledge Editing¶

Conference: ACL 2026
arXiv: 2508.01302
Code: GitHub
Area: Knowledge Editing
Keywords: Real-time Knowledge Editing, Knowledge Alignment, Dataset Contamination, Diverse Augmentation, Self-adaptive Inference

TL;DR¶

This paper introduces CRAFT (a continuously updated Chinese financial knowledge editing dataset) and KEDAS (a knowledge editing alignment paradigm based on diverse edit augmentation and self-adaptive inference), addressing the problem that existing knowledge editing methods cannot simultaneously achieve high editing success rate, locality, and portability in real-time scenarios.

Background & Motivation¶

Background: Knowledge editing aims to efficiently modify outdated knowledge in language models without full retraining. However, mainstream evaluation datasets (ZsRE, MQuAKE, RippleEdits) are static and cannot be updated once published.

Limitations of Prior Work: (1) Static datasets suffer from serious data leakage — most knowledge has already been seen by LMs during pre-training, leading to unfair evaluation; (2) WikiBigEdit is real-time but requires processing hundreds of GB of Wiki data with severe sparsity; (3) Existing methods struggle to balance editing success rate, locality, and portability.

Key Challenge: Parameter modification methods (e.g., ROME, WISE) suffer severe model degradation during sequential editing; retrieval methods (e.g., IKE, EREN) exhibit unstable performance due to lack of alignment; alignment methods (e.g., LTE) have poor locality due to overfitting.

Goal: Build a continuously updated, contamination-free real-time knowledge editing dataset, and propose a knowledge editing method that performs well across all metrics.

Key Insight: Leverage publicly available Chinese official financial statistics (continuously updated and unseen by LMs) for dataset construction, and redefine knowledge editing as an LM alignment problem.

Core Idea: Through one-time offline alignment (LoRA fine-tuning) to endow the LM with knowledge editing capability, then at inference time use self-adaptive routing to decide whether to use the original model or the aligned model, fundamentally solving the locality problem.

Method¶

Overall Architecture¶

KEDAS has two stages: (1) Offline alignment — LoRA fine-tuning on knowledge editing format data to endow the LM with the ability to update answers using edit prompts; (2) Online editing — storing new knowledge in diverse forms in memory, with intelligent retrieval and self-adaptive model path selection during inference.

Key Designs¶

CRAFT Dataset:
- Function: Provide a continuously updated, contamination-free real-time knowledge editing evaluation benchmark
- Mechanism: Leverages publicly available Chinese official financial and statistical data (e.g., GDP, population), designing paired edits as compositional reasoning tests. Supports alias portability, temporal locality, and common-sense locality evaluation
- Design Motivation: Official statistical data is continuously updated, guaranteeing data freshness; paired edit design tests the model's ability to integrate multiple edits (compositional portability)
Diverse Edit Augmentation:
- Function: Enhance coverage and retrieval robustness of edit memory
- Mechanism: Each edit is stored in multiple expression forms — including original QA pairs, paraphrased versions, alias versions, etc., increasing retrieval hit rate
- Design Motivation: User queries may express the same knowledge need in various ways; a single-form edit may lead to retrieval failure
Self-adaptive Inference Path:
- Function: Dynamically select whether to activate the LoRA-aligned model at inference time
- Mechanism: A filter-augmented intelligent retriever determines whether the query is related to any edit. If related, it is processed through the LoRA-aligned model (with edit context provided); if unrelated, the original LM is used directly
- Design Motivation: Fundamentally solves the locality problem — completely unmodified model behavior for unrelated queries, avoiding knowledge forgetting caused by overfitting

Loss & Training¶

Alignment stage uses LoRA fine-tuning, with training data containing both in-scope and out-of-scope queries. After one-time alignment, subsequent edits only operate on memory without parameter modifications.

Key Experimental Results¶

Main Results¶

Method	Edit Success	Locality	Portability	Overall
ROME (Parameter Modification)	High→Degraded	Poor	Poor	Unbalanced
IKE (Retrieval)	Medium	Medium	Medium	Unstable
LTE (Alignment)	High	Poor (Overfitting)	Medium	Unbalanced
KEDAS (Ours)	High	High	High	Excellent across all

Ablation Study¶

Config	Metric	Note
Data leakage analysis	CRAFT exposure rate ≈ 0	Traditional datasets largely seen by LMs
Remove diverse augmentation	Retrieval recall decreases	Multi-form storage improves robustness
Remove self-adaptive inference	Locality drops	Routing mechanism is key to locality guarantee

Key Findings¶

Knowledge leakage in existing datasets is severe — exposure rates of 5 LMs on traditional datasets far exceed CRAFT
Parameter modification methods like ROME rapidly degrade during sequential editing, failing to meet real-time editing requirements
KEDAS significantly outperforms all baselines on both CRAFT and traditional datasets, achieving balanced performance across all metrics for the first time

Highlights & Insights¶

Revealing the data leakage problem carries warning significance for the knowledge editing field — evaluation results may not reflect true capabilities
The "align once, edit forever" paradigm elegantly separates alignment cost from editing flexibility
Self-adaptive inference path cleverly resolves the edit-vs-no-edit trade-off — editing without parameter modification

Limitations & Future Work¶

CRAFT currently covers only Chinese and the financial/statistical domain; generalization to other languages and domains requires further validation
Self-adaptive retriever quality is the system performance bottleneck
Memory management efficiency at extremely large scales (e.g., millions of edits) is not discussed
Future directions include more efficient alignment strategies and cross-lingual real-time editing

vs ROME/MEMIT: Parameter modification methods degrade during sequential editing; KEDAS avoids this through external memory
vs LTE: Both are alignment methods, but KEDAS resolves LTE's overfitting problem through self-adaptive inference paths
vs RAG: KEDAS not only retrieves edits but also aligns the LM's ability to utilize edits, making it more effective than pure RAG

Rating¶

Novelty: ⭐⭐⭐⭐ Both CRAFT dataset and KEDAS paradigm are innovative
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Dual validation on CRAFT + traditional datasets, data leakage analysis
Writing Quality: ⭐⭐⭐⭐ Problem definition is clear, method description is systematic
Value: ⭐⭐⭐⭐⭐ Dual contribution of dataset and methodology to the knowledge editing field

title: >- [Paper Note] Aligning Language Models with Real-time Knowledge Editing description: >- [ACL 2026][Knowledge Editing][Real-time Editing] CRAFT dataset (continuously updated Chinese financial knowledge) and KEDAS paradigm (diverse edit augmentation + self-adaptive inference) for balanced knowledge editing performance. tags: - ACL 2026 - Real-time Knowledge Editing - Knowledge Editing - Dataset Contamination - Diverse Augmentation - Self-Adaptive Inference date: 2025-04-17 content_hash: f37b76ed1e42eee6

Aligning Language Models with Real-time Knowledge Editing¶

Conference: ACL 2026
arXiv: 2508.01302
Code: GitHub
Area: Knowledge Editing
Keywords: Real-time Knowledge Editing, Knowledge Alignment, Dataset Contamination, Diverse Augmentation, Self-Adaptive Inference

TL;DR¶

This paper introduces CRAFT (a continuously updated Chinese financial knowledge editing dataset) and KEDAS (a knowledge editing alignment paradigm based on diverse edit augmentation and self-adaptive inference), addressing the inability of existing knowledge editing methods to balance edit success rate, locality, and portability in real-time scenarios.

Background & Motivation¶

Background: Knowledge editing aims to efficiently modify outdated knowledge in language models without full retraining. However, mainstream evaluation datasets (ZsRE, MQuAKE, RippleEdits) are static and cannot be updated once published.

Limitations of Prior Work: (1) Static datasets suffer from severe data leakage — most knowledge has already been seen by LMs during pre-training, leading to unfair evaluation; (2) WikiBigEdit, while real-time, requires processing hundreds of GB of Wiki data with severe sparsity; (3) Existing methods struggle to balance edit success rate, locality, and portability.

Key Challenge: Parameter modification methods (e.g., ROME, WISE) suffer severe model degradation during continuous editing; retrieval methods (e.g., IKE, EREN) exhibit unstable performance due to lack of alignment; alignment methods (e.g., LTE) show poor locality due to overfitting.

Goal: Build a continuously updated, contamination-free real-time knowledge editing dataset and propose a knowledge editing method that performs uniformly well across all metrics.

Key Insight: Leverage publicly available Chinese official financial statistical data (continuously updated and unseen by LMs) to construct the dataset, and redefine knowledge editing as an LM alignment problem.

Core Idea: Through one-time offline alignment (LoRA fine-tuning) to endow the LM with knowledge editing capabilities, then use self-adaptive routing during inference to decide whether to use the original or aligned model, fundamentally solving the locality problem.

Method¶

Overall Architecture¶

KEDAS has two stages: (1) Offline alignment — fine-tuning the LM with LoRA on knowledge editing format data to endow it with the ability to update answers using edit prompts; (2) Online editing — storing new knowledge in diverse forms in memory, with intelligent retrieval and self-adaptive model path selection during inference.

Key Designs¶

CRAFT Dataset:
- Function: Provide a continuously updated, contamination-free real-time knowledge editing evaluation benchmark
- Mechanism: Leverages publicly available Chinese official financial and statistical data (e.g., GDP, population), designing paired edits as compositional reasoning tests. Supports alias portability, temporal locality, and commonsense locality evaluation
- Design Motivation: Official statistical data is continuously updated, guaranteeing data freshness; paired edit design tests the model's ability to integrate multiple edits (composite portability)
Diverse Edit Augmentation:
- Function: Enhance edit memory coverage and retrieval robustness
- Mechanism: Store each edit in multiple expression forms — including original QA pairs, paraphrased versions, alias versions, etc., increasing retrieval hit rate
- Design Motivation: User queries may express the same knowledge need in various ways; single-form edits may cause retrieval failures
Self-Adaptive Inference Path:
- Function: Dynamically select whether to activate the LoRA-aligned model during inference
- Mechanism: A filter-augmented intelligent retriever determines whether the query is related to any edit. If related, the query is processed through the LoRA-aligned model (providing edit context); if unrelated, the original LM is used directly
- Design Motivation: Fundamentally solves the locality problem — model behavior is completely unmodified for unrelated queries, avoiding knowledge forgetting caused by overfitting

Loss & Training¶

The alignment stage uses LoRA fine-tuning with training data containing both in-scope and out-of-scope edit queries. After one-time alignment, subsequent edits only operate on memory without parameter modification.

Key Experimental Results¶

Main Results¶

Method	Edit Success Rate	Locality	Portability	Overall
ROME (Parameter Mod.)	High→Degraded	Poor	Poor	Imbalanced
IKE (Retrieval)	Medium	Medium	Medium	Unstable
LTE (Alignment)	High	Poor (Overfitting)	Medium	Imbalanced
KEDAS (Ours)	High	High	High	Excellent across all

Ablation Study¶

Config	Metric	Note
Data leakage analysis	CRAFT exposure rate ≈ 0	Most traditional datasets already seen by LMs
Remove diverse augmentation	Retrieval recall decreased	Multi-form storage improves robustness
Remove self-adaptive inference	Locality decreased	Routing mechanism is key to locality preservation

Key Findings¶

Knowledge leakage in existing datasets is severe — exposure rates of 5 LMs on traditional datasets are far higher than on CRAFT
Parameter modification methods like ROME degrade rapidly during continuous editing, unable to meet real-time editing demands
KEDAS significantly outperforms all baselines on both CRAFT and traditional datasets, achieving balanced performance across all metrics for the first time

Highlights & Insights¶

The revelation of data leakage problems has cautionary significance for the knowledge editing field — evaluation results may not reflect true capabilities
The "align once, edit for life" paradigm elegantly separates alignment cost from editing flexibility
The self-adaptive inference path cleverly resolves the edit/no-edit trade-off — editing without parameter modification

Limitations & Future Work¶

CRAFT currently covers only Chinese and financial/statistical domains; generalization to other languages and domains requires further validation
Self-adaptive retriever quality is the system performance bottleneck
Memory management efficiency for extremely large-scale edits (e.g., millions) is not discussed
Future directions include more efficient alignment strategies and cross-lingual real-time editing

vs ROME/MEMIT: Parameter modification methods degrade during continuous editing; KEDAS avoids this through external memory
vs LTE: Both are alignment methods; KEDAS solves LTE's overfitting problem through self-adaptive inference path
vs RAG: KEDAS not only retrieves edits but also aligns the LM's ability to utilize edits, making it more effective than pure RAG

Rating¶

Novelty: ⭐⭐⭐⭐ Both the CRAFT dataset and KEDAS paradigm are innovative
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Dual validation on CRAFT + traditional datasets, data leakage analysis
Writing Quality: ⭐⭐⭐⭐ Clear problem definition, systematic method description
Value: ⭐⭐⭐⭐⭐ Dual contribution of dataset and methodology to the knowledge editing field

Aligning Language Models with Real-time Knowledge Editing¶

TL;DR¶

Background & Motivation¶

Method¶

Overall Architecture¶

Key Designs¶

Loss & Training¶

Key Experimental Results¶

Main Results¶

Ablation Study¶

Key Findings¶

Highlights & Insights¶

Limitations & Future Work¶

Related Work & Insights¶

Rating¶

Related Papers¶

Aligning Language Models with Real-time Knowledge Editing¶

TL;DR¶

Background & Motivation¶

Method¶

Overall Architecture¶

Key Designs¶

Loss & Training¶

Key Experimental Results¶

Main Results¶

Ablation Study¶

Key Findings¶

Highlights & Insights¶

Limitations & Future Work¶

Related Work & Insights¶

Rating¶

Related Papers¶