EA-Agent: A Structured Multi-Step Reasoning Agent for Entity Alignment¶
Conference: ACL 2026 arXiv: 2604.11686 Code: GitHub Area: LLM Agent Keywords: entity alignment, knowledge graph, multi-step reasoning, tool planning, reward-guided optimization
TL;DR¶
This paper proposes EA-Agent, which decomposes entity alignment (EA) into a structured multi-step reasoning process. Through planning and execution over a tool pool (triple selector + alignment tool + reflector), EA-Agent achieves interpretable alignment decisions. Combined with reward-guided offline policy optimization for continuous improvement of planning capability, it achieves up to 3.17% Hits@1 improvement on DBP15K while reducing efficiency issues caused by redundant triples.
Background & Motivation¶
Background: Entity alignment is a foundational technique for knowledge fusion, aiming to identify nodes across different knowledge graphs that refer to the same real-world entity. Traditional approaches rely on knowledge representation learning (e.g., TransE, GCN-Align), but exhibit limited performance under noisy or sparsely supervised settings. Recent LLM-based methods (e.g., ChatEA, LLMEA) have improved performance by leveraging semantic understanding.
Limitations of Prior Work: (1) Existing LLM-based EA methods treat LLMs as black-box decision makers, lacking interpretability—it is difficult to determine which information is critical for alignment decisions; (2) directly feeding large numbers of attribute and relation triples leads to excessively long prompts and high inference costs; (3) many triples are redundant or noisy, which interferes with decision-making.
Key Challenge: There is a need to leverage LLMs' powerful semantic understanding while simultaneously addressing the issues of black-box opacity and efficiency under large-scale triple inputs.
Goal: To design a reasoning-driven agent framework that achieves interpretable, controllable, and efficient entity alignment through multi-step tool planning and execution.
Key Insight: EA is framed as a multi-step decision problem—first selecting the most informative triples, then making alignment decisions, and finally performing reflective verification under uncertainty.
Core Idea: A tool pool (attribute/relation triple selectors + alignment tool + reflector) + path planning + reward-guided offline policy optimization.
Method¶
Overall Architecture¶
Three stages: (1) Path Planning: the agent autonomously plans a tool invocation path based on structural features of the source entity and candidate similarity scores; (2) Tool Invocation: triple selection, alignment decision, and reflective verification are executed in the planned order; (3) Agent Optimization: a reward function evaluates path quality → offline policy update → iterative improvement.
Key Designs¶
-
Attribute/Relation Triple Selector:
- Function: filters redundant triples before LLM reasoning, retaining the most discriminative information.
- Mechanism: The attribute selector applies an entropy-based criterion \(H(a) = -\sum p(v)\log p(v)\)—attributes whose values are more uniformly distributed (lower entropy) across candidate entities have stronger discriminative power. The relation selector uses inverse-frequency weighting \(I(r) = \log(N/(\text{freq}(r)+1))\)—rarer relations are more discriminative. Predefined important attributes are also preserved.
- Design Motivation: Feeding all triples indiscriminately wastes tokens and introduces noise. The selector acts as an information bottleneck, retaining only critical signals.
-
Reward-Guided Path Optimization:
- Function: continuously improves the agent's tool planning strategy.
- Mechanism: The reward function \(\gamma = \gamma_\mu + c \cdot \gamma_{\text{ref}} + \gamma_e\) comprises three components: (1) alignment correctness \(\gamma_\mu\) (primary); (2) reflection quality \(\gamma_{\text{ref}}\) (rewards successful corrections, penalizes erroneous modifications, lightly penalizes redundant reflection); (3) path efficiency \(\gamma_e = e^{-\beta \cdot l}\) (penalizes excessively long paths). The policy is optimized via reward-guided offline SFT by rewriting paths.
- Design Motivation: Single-round planning may produce redundant or inefficient paths. The closed-loop cycle of "plan → execute → evaluate → update" ensures continuous policy improvement.
-
Reflector (conditionally activated):
- Function: verifies and corrects uncertain alignment results.
- Mechanism: An LLM-based module activated only when candidate similarity scores indicate ambiguity. It re-evaluates candidates based on prior context and provides a revised prediction.
- Design Motivation: Not all alignments require reflection—direct decision-making is more efficient for straightforward cases, and additional verification is triggered only under uncertainty.
Key Experimental Results¶
Main Results (DBP15K)¶
| Method | FR-EN Hits@1 | JA-EN Hits@1 | ZH-EN Hits@1 |
|---|---|---|---|
| GCN-Align | ~40 | ~40 | ~40 |
| TEA | ~90 | ~90 | ~85 |
| ChatEA | ~92 | ~91 | ~88 |
| EA-Agent | ~95 | ~94 | ~91 |
Ablation Study¶
| Configuration | Description |
|---|---|
| w/o triple selection | Token consumption increases substantially; slight performance degradation |
| w/o reflector | Error rate increases on uncertain cases |
| w/o path optimization | Planning strategy becomes unstable; more redundant tool invocations |
| Full EA-Agent | Optimal performance + highest efficiency |
Key Findings¶
- EA-Agent achieves state-of-the-art on all datasets, with up to 3.17% improvement in Hits@1 and consistent MRR gains.
- The triple selector significantly reduces token consumption while maintaining or even improving performance—confirming that a large proportion of triples are indeed redundant.
- Path optimization substantially improves planning quality: after 3 iterations, both path efficiency and alignment accuracy improve steadily.
- Conditional activation of the reflector is the optimal strategy: always-on activation is inferior to on-demand activation.
- Interpretability: every alignment decision is traceable to a specific tool invocation path and the corresponding key triples.
Highlights & Insights¶
- Framing EA as a multi-step tool planning problem opens up the agent paradigm for knowledge graph tasks.
- The three-component reward function design is highly practical: it balances correctness, reflection quality, and efficiency, avoiding optimization bias toward any single objective.
- The triple selector's use of information-theoretic criteria (entropy and inverse frequency) is a simple yet effective approach that can be directly transferred to other KG tasks.
Limitations & Future Work¶
- The approach depends on TEA to generate initial candidate lists, so candidate quality caps overall performance.
- Path optimization requires multiple iterations, leading to relatively high training costs.
- Validation is conducted only on cross-lingual EA; monolingual or cross-domain EA remains unexplored.
- The tool pool is manually designed; whether new tools can be discovered automatically is an open question.
- The reflector's judgments may introduce new hallucinations.
Related Work & Insights¶
- vs. ChatEA: ChatEA formats KG structure using code, but alignment decisions remain black-box. EA-Agent achieves interpretable decisions through tool planning.
- vs. LLMEA: LLMEA directly inputs all triples, whereas EA-Agent selects before aligning, yielding higher efficiency.
- vs. general agent frameworks: EA-Agent specializes the agent paradigm for KG tasks; both the tool design and reward function are task-specific.
Rating¶
- Novelty: ⭐⭐⭐⭐ Introducing the agent paradigm into EA is novel, though individual components (tool planning, LoRA fine-tuning) are not new.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ 3 datasets + 10 baselines + ablation + efficiency analysis + interpretability case studies.
- Writing Quality: ⭐⭐⭐⭐ RQ-driven structure; formalization is clear.
- Value: ⭐⭐⭐⭐ Provides methodological inspiration for LLM applications in the KG domain.