Skip to content

EA-Agent: A Structured Multi-Step Reasoning Agent for Entity Alignment

Conference: ACL 2026 arXiv: 2604.11686 Code: GitHub Area: LLM Agent Keywords: entity alignment, knowledge graph, multi-step reasoning, tool planning, reward-guided optimization

TL;DR

This paper proposes EA-Agent, which decomposes entity alignment (EA) into a structured multi-step reasoning process. Through planning and execution over a tool pool (triple selector + alignment tool + reflector), EA-Agent achieves interpretable alignment decisions. Combined with reward-guided offline policy optimization for continuous improvement of planning capability, it achieves up to 3.17% Hits@1 improvement on DBP15K while reducing efficiency issues caused by redundant triples.

Background & Motivation

Background: Entity alignment is a foundational technique for knowledge fusion, aiming to identify nodes across different knowledge graphs that refer to the same real-world entity. Traditional approaches rely on knowledge representation learning (e.g., TransE, GCN-Align), but exhibit limited performance under noisy or sparsely supervised settings. Recent LLM-based methods (e.g., ChatEA, LLMEA) have improved performance by leveraging semantic understanding.

Limitations of Prior Work: (1) Existing LLM-based EA methods treat LLMs as black-box decision makers, lacking interpretability—it is difficult to determine which information is critical for alignment decisions; (2) directly feeding large numbers of attribute and relation triples leads to excessively long prompts and high inference costs; (3) many triples are redundant or noisy, which interferes with decision-making.

Key Challenge: There is a need to leverage LLMs' powerful semantic understanding while simultaneously addressing the issues of black-box opacity and efficiency under large-scale triple inputs.

Goal: To design a reasoning-driven agent framework that achieves interpretable, controllable, and efficient entity alignment through multi-step tool planning and execution.

Key Insight: EA is framed as a multi-step decision problem—first selecting the most informative triples, then making alignment decisions, and finally performing reflective verification under uncertainty.

Core Idea: A tool pool (attribute/relation triple selectors + alignment tool + reflector) + path planning + reward-guided offline policy optimization.

Method

Overall Architecture

Three stages: (1) Path Planning: the agent autonomously plans a tool invocation path based on structural features of the source entity and candidate similarity scores; (2) Tool Invocation: triple selection, alignment decision, and reflective verification are executed in the planned order; (3) Agent Optimization: a reward function evaluates path quality → offline policy update → iterative improvement.

Key Designs

  1. Attribute/Relation Triple Selector:

    • Function: filters redundant triples before LLM reasoning, retaining the most discriminative information.
    • Mechanism: The attribute selector applies an entropy-based criterion \(H(a) = -\sum p(v)\log p(v)\)—attributes whose values are more uniformly distributed (lower entropy) across candidate entities have stronger discriminative power. The relation selector uses inverse-frequency weighting \(I(r) = \log(N/(\text{freq}(r)+1))\)—rarer relations are more discriminative. Predefined important attributes are also preserved.
    • Design Motivation: Feeding all triples indiscriminately wastes tokens and introduces noise. The selector acts as an information bottleneck, retaining only critical signals.
  2. Reward-Guided Path Optimization:

    • Function: continuously improves the agent's tool planning strategy.
    • Mechanism: The reward function \(\gamma = \gamma_\mu + c \cdot \gamma_{\text{ref}} + \gamma_e\) comprises three components: (1) alignment correctness \(\gamma_\mu\) (primary); (2) reflection quality \(\gamma_{\text{ref}}\) (rewards successful corrections, penalizes erroneous modifications, lightly penalizes redundant reflection); (3) path efficiency \(\gamma_e = e^{-\beta \cdot l}\) (penalizes excessively long paths). The policy is optimized via reward-guided offline SFT by rewriting paths.
    • Design Motivation: Single-round planning may produce redundant or inefficient paths. The closed-loop cycle of "plan → execute → evaluate → update" ensures continuous policy improvement.
  3. Reflector (conditionally activated):

    • Function: verifies and corrects uncertain alignment results.
    • Mechanism: An LLM-based module activated only when candidate similarity scores indicate ambiguity. It re-evaluates candidates based on prior context and provides a revised prediction.
    • Design Motivation: Not all alignments require reflection—direct decision-making is more efficient for straightforward cases, and additional verification is triggered only under uncertainty.

Key Experimental Results

Main Results (DBP15K)

Method FR-EN Hits@1 JA-EN Hits@1 ZH-EN Hits@1
GCN-Align ~40 ~40 ~40
TEA ~90 ~90 ~85
ChatEA ~92 ~91 ~88
EA-Agent ~95 ~94 ~91

Ablation Study

Configuration Description
w/o triple selection Token consumption increases substantially; slight performance degradation
w/o reflector Error rate increases on uncertain cases
w/o path optimization Planning strategy becomes unstable; more redundant tool invocations
Full EA-Agent Optimal performance + highest efficiency

Key Findings

  • EA-Agent achieves state-of-the-art on all datasets, with up to 3.17% improvement in Hits@1 and consistent MRR gains.
  • The triple selector significantly reduces token consumption while maintaining or even improving performance—confirming that a large proportion of triples are indeed redundant.
  • Path optimization substantially improves planning quality: after 3 iterations, both path efficiency and alignment accuracy improve steadily.
  • Conditional activation of the reflector is the optimal strategy: always-on activation is inferior to on-demand activation.
  • Interpretability: every alignment decision is traceable to a specific tool invocation path and the corresponding key triples.

Highlights & Insights

  • Framing EA as a multi-step tool planning problem opens up the agent paradigm for knowledge graph tasks.
  • The three-component reward function design is highly practical: it balances correctness, reflection quality, and efficiency, avoiding optimization bias toward any single objective.
  • The triple selector's use of information-theoretic criteria (entropy and inverse frequency) is a simple yet effective approach that can be directly transferred to other KG tasks.

Limitations & Future Work

  • The approach depends on TEA to generate initial candidate lists, so candidate quality caps overall performance.
  • Path optimization requires multiple iterations, leading to relatively high training costs.
  • Validation is conducted only on cross-lingual EA; monolingual or cross-domain EA remains unexplored.
  • The tool pool is manually designed; whether new tools can be discovered automatically is an open question.
  • The reflector's judgments may introduce new hallucinations.
  • vs. ChatEA: ChatEA formats KG structure using code, but alignment decisions remain black-box. EA-Agent achieves interpretable decisions through tool planning.
  • vs. LLMEA: LLMEA directly inputs all triples, whereas EA-Agent selects before aligning, yielding higher efficiency.
  • vs. general agent frameworks: EA-Agent specializes the agent paradigm for KG tasks; both the tool design and reward function are task-specific.

Rating

  • Novelty: ⭐⭐⭐⭐ Introducing the agent paradigm into EA is novel, though individual components (tool planning, LoRA fine-tuning) are not new.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ 3 datasets + 10 baselines + ablation + efficiency analysis + interpretability case studies.
  • Writing Quality: ⭐⭐⭐⭐ RQ-driven structure; formalization is clear.
  • Value: ⭐⭐⭐⭐ Provides methodological inspiration for LLM applications in the KG domain.