ICML2025 Medical LLM RAG retrieval system attacks data poisoning universal poisoning attacks orthogonal augmentation property medical Q&A safety

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains¶

Conference: ICML2025
arXiv: 2409.17275
Code: To be confirmed
Area: Medical NLP
Keywords: RAG, retrieval system attacks, data poisoning, universal poisoning attacks, orthogonal augmentation property, medical Q&A safety

TL;DR¶

This paper systematically reveals the vulnerability of RAG retrieval systems in knowledge-intensive domains (healthcare, law) to universal poisoning attacks. It proposes the "orthogonal augmentation" property to explain the cause of the attack and designs a detection-based defense method using distribution-aware distance, achieving near-perfect detection rates in almost all scenarios.

Background & Motivation¶

Retrieval-Augmented Generation (RAG) enhances LLM performance in knowledge-intensive domains by retrieving relevant documents from external corpora. However, when external corpora are publicly accessible (e.g., Wikipedia, PubMed) or controlled by potentially malicious agents, the security of the retrieval system becomes a major concern.

Prior works have demonstrated that data poisoning attacks can be launched against retrieval systems, but they suffer from the following limitations: - Existing attacks mainly target general Q&A, lacking systematic investigation in safety-critical domains such as medicine. - There is a lack of theoretical explanation for why these attacks are effective. - Existing defense methods (e.g., checking the \(\ell_2\) norm of embeddings) have been proven ineffectual.

The core motivation of this paper is to systematically evaluate the vulnerability of retrieval systems in high-risk scenarios like medical Q&A, understand the root causes, and propose effective defenses.

Method¶

1. Universal Poisoning Attack¶

An attacker constructs a poisoned document by directly appending target information (such as personally identifiable information (PII), malicious diagnostic advice, etc.) to a query:

\[p_i = [q_i \oplus \text{Target Information}]\]

where \(q_i\) is a clean query, and \(\oplus\) denotes text-level concatenation. After injecting the poisoned document into the corpus, when a user enters the query specified by the attacker, the poisoned document will be retrieved with a high rank (e.g., Top-1).

The attack can lead to three types of security risks: 1. PII Leakage: Retrieving documents containing personally identifiable information. 2. Adversarial Recommendation: LLMs providing incorrect therapeutic advice based on malicious documents. 3. Jailbreak Attacks: Using retrieved documents as context to trigger LLM jailbreaks.

2. Orthogonal Augmentation Property¶

This is the core theoretical discovery proposed in this paper. For the retriever's embedding function \(f: \mathcal{V}^L \mapsto \mathbb{R}^d\), when the embeddings of two documents \(q\) and \(p\) are approximately orthogonal:

\[f([q \oplus p]) \approx f(q) + v, \quad v^T f(q) \approx 0\]

That is, the direction of the embedding shift after concatenation is orthogonal to the original query embedding. Consequently, the similarity based on the inner product remains almost unaffected:

\[f(q)^T f([q \oplus p]) \approx f(q)^T f(q) + f(q)^T v \approx f(q)^T f(q)\]

This implies that the similarity between the poisoned document \([q \oplus p]\) and the original query \(q\) is close to the similarity of the query with itself, thereby guaranteeing high-ranking retrieval.

Key addition: Embedding orthogonality \(\neq\) semantic irrelevance. Experiments show that the angle between query embeddings of different batches in MedQA is approximately 70°, yet they are all semantically related to biomedicine. This demonstrates that the attack is effective for both semantically related and unrelated target information.

3. Detection-based Defense¶

It is observed that: - Clean retrieved documents have a large embedding angle with the query (around 70°, with their inner product being only 25% of the query's self-similarity). - Poisoned documents do not significantly deviate from the query due to the orthogonal augmentation property. - Poisoned documents tend to be orthogonal to clean documents.

Based on these observations, the authors propose using a distribution-aware distance metric (such as Mahalanobis distance) instead of the isotropic \(\ell_2\) distance to detect poisoned documents. This metric captures the probability distribution features of the data, allowing clean and poisoned documents to be clearly separated in the new metric space.

Key Experimental Results¶

Attack Experiments¶

Evaluated across 225 combinations (3 corpora \(\times\) 3 retrievers \(\times\) 5 query sets \(\times\) 5 target information types):

Setting Dimension	Details
Corpus	Textbook (~126K), StatPearls (~301K), PubMed (~2M)
Retriever	MedCPT, SPECTER, Contriever
Query Set	MMLU-Med, MedQA-US, MedMCQA, PubMedQA, BioASQ
Target Information	PII, diagnostic information, MS-MARCO/NQ/HotpotQA adversarial passages

Top-2 Retrieval Success Rate (Selected Representative Results):

Corpus	Retriever	Query Set	Success Rate Range
Textbook	MedCPT	MedQA	0.98–0.99
Textbook	Contriever	MedQA	1.0
StatPearls	Contriever	MedQA	1.0
PubMed	Contriever	MedQA	1.0
Textbook	Contriever	MedMCQA	0.63–0.68
Textbook	MedCPT	PubMedQA	0.97–0.99

Legal Q&A attack success rate: Contriever 0.81–0.88, SPECTER 0.78–0.92.

Robustness to Adversarial Paraphrasing¶

After paraphrasing queries using GPT-4, the attack success rate still reaches ~0.8, indicating that the attacker does not need to know the exact query.

Validation of the Orthogonal Augmentation Property¶

Retriever	\(f(q)^T(f([q \oplus p]) - f(q))\)	Shift Angle
Contriever	0.05–0.13	92.5°–99.1°
MedCPT	0.42–1.64	95.9°–98.9°

The shift direction is close to 90°, validating the orthogonal augmentation property.

Defense Experiments¶

The detection method based on distribution-aware distance achieves a near-perfect detection rate across all attack scenarios, whereas the \(\ell_2\) norm method fails to distinguish clean documents from poisoned ones.

Highlights & Insights¶

Reveals the fundamental vulnerability of dense retrieval: A high-success-rate attack can be achieved via simple concatenation without needing access to model parameters.
The Orthogonal Augmentation Property is an elegant theoretical contribution: It explains why poisoning via concatenation is so effective—the embedding shift is in the orthogonal direction, avoiding any impact on the similarity to the query.
Extensively comprehensive experimental coverage: 225 combinations across corpora, retrievers, query sets, and target information types make the conclusions highly convincing.
The finding that clean documents are also not highly similar to the query (angle of ~70°) reveals the quality bottleneck of retrieval systems themselves and provides "room" for attacks.
Robustness of the attack to paraphrasing significantly elevates the real-world threat level.

Limitations & Future Work¶

Cached file truncation: The complete technical details of the defense method (such as the specific distance metric formulas) were not fully acquired.
The attack method itself is relatively simple (direct concatenation) and might be detected by content auditing in practical deployments.
The defense assumes that the defender possesses a clean anchor document set and full access to the retriever, which limits its applicability to third-party RAG service scenarios.
Only dense retrieval is considered, with no analysis of vulnerability in sparse retrieval (such as BM25).
The impact chain of the attack on the LLM's final generation quality is not explored in depth (focusing primarily on the retrieval phase).
PubMed only used a ~2M subset in the experiments (totaling 23M in the full corpus); the attack performance might vary under large-scale corpora.

PoisonedRAG [Zou et al.]: Poisons systems to make LLMs generate attacker-specified answers, from which this work fundamentally differs in objectives and insights.
Contriever [Izacard et al.], MedCPT [Jin et al.]: Representative dense retrievers; this paper proves they share the same vulnerability.
The defense methodology can be extended to anomaly detection tasks in other embedding spaces.
It sounds an alarm for the widespread adoption of RAG in the medical field, suggesting the integration of retrieval result auditing mechanisms during deployment.

Rating¶

Novelty: ⭐⭐⭐⭐ — The orthogonal augmentation property is a novel theoretical contribution, and the concept of universal poisoning is systematically validated in the medical field for the first time.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Extremely thorough design with 225 combinations + paraphrasing robustness + legal Q&A extensions.
Writing Quality: ⭐⭐⭐⭐ — Well-structured, with a complete logical chain from attack \(\rightarrow\) understanding \(\rightarrow\) defense.
Value: ⭐⭐⭐⭐⭐ — Holds significant practical security implications for RAG deployment, especially in high-risk areas like healthcare.