Skip to content

ALGEN: Few-Shot Inversion Attacks on Textual Embeddings via Cross-Model Alignment

Conference: ACL 2025
arXiv: 2502.11308
Code: None
Area: Others
Keywords: Textual embedding inversion attacks, few-shot attacks, cross-model alignment, vector database security, privacy leakage

TL;DR

This paper proposes ALGEN, a few-shot textual embedding inversion attack method. By linearly aligning the victim's embedding space with the attacker's embedding space and then using a trained embedding-to-text generator to reconstruct the original text, a partially successful attack can be launched with only 1 leaked sample, achieving a Rouge-L of 45.75 with 1,000 samples.

Background & Motivation

Background: With the popularity of LLMs and Vector Databases (Vector DBs), more and more private text data is processed and stored as numerical embeddings. Vector database services such as Pinecone and Weaviate, along with RAG systems, widely employ embeddings to support search and retrieval.

Limitations of Prior Work: Prior embedding inversion attack studies have shown that original texts can be reconstructed from embeddings. However, these methods require a massive amount of leaked data to train the attack model—Li et al. require 100k samples, Morris et al. (Vec2Text) require 1-5 million samples, and Huang et al. require 8k samples. Such large-scale data leakage assumptions are difficult to satisfy in practice.

Key Challenge: Attackers wish to launch effective attacks from just a few leaked embeddings, but the attack models in existing methods require large amounts of training data and typically need direct access to the victim encoder to obtain embeddings for training.

Goal: Significantly reduce the data requirements of embedding inversion attacks, enabling true few-shot or even single-shot attacks.

Key Insight: Instead of directly training the attack model on victim embeddings, this work aligns the victim embeddings with the attacker's own embedding space via a linear transformation, and then reuses a general-purpose decoder trained on the attacker's space.

Core Idea: A three-step attack pipeline—(1) Train a local embedding-to-text generator; (2) Use a small amount of leaked pairs to learn cross-model linear alignment via least squares; (3) Reconstruct texts by decoding the aligned victim embeddings in the attacker space.

Method

Overall Architecture

ALGEN consists of three independent stages: the first stage trains the attacker's local embedding-to-text generation model on a public corpus (without involving the victim); the second stage learns the linear mapping from the victim to the attacker's embedding space using a small amount of leaked data; the third stage combines the mapping and the decoder to launch the attack.

Key Designs

  1. Embedding-to-Text Generator:

    • Function: Decodes embeddings from the attacker's own encoder back into the original text.
    • Mechanism: FlanT5 is chosen as the backbone, and the FlanT5 decoder is fine-tuned using a public corpus (150k sentences from the MultiHPLT English dataset). The input is the sentence embedding \(\mathbf{e}_A\) generated by the attacker's encoder \(enc_A\) (obtained via mean pooling + L2 normalization), and the training objective is the cross-entropy loss. Once trained, this generator can decode embeddings in the attacker space into texts.
    • Design Motivation: This generator is completely independent of the victim model and can be trained in advance. Public corpora are easily accessible and do not expose the attacker's intentions.
  2. Embedding Space Alignment:

    • Function: Maps the victim embeddings to the attacker's embedding space.
    • Mechanism: Assume a small set of leaked data pairs \((X, E_V)\), where \(X\) represents text and \(E_V = enc_V(X)\) denotes victim embeddings. Meanwhile, the attacker encoder computes \(E_A = enc_A(X)\). The least-squares method is utilized to solve \(E_V W \approx E_A\), yielding the closed-form solution \(W = (E_V^T E_V)^{-1} E_V^T E_A\) (Moore-Penrose pseudoinverse). This is a one-step linear transformation that requires no training.
    • Design Motivation: Approximate linear relations exist between different embedding spaces, as validated in research such as cross-lingual word vector alignment. Linear mapping can learn a reasonable transformation with very few samples—even a single sample can provide a partially effective alignment.
  3. Inversion Attack Execution:

    • Function: Combines alignment and decoding to complete text reconstruction.
    • Mechanism: Given the stolen victim embeddings \(E_V\), the original text is reconstructed via \(\hat{X} = dec_A(E_V W)\). The entire attack pipeline is highly efficient, requiring no iterative optimization or GPU-intensive computations.
    • Design Motivation: The computational cost of linear alignment is extremely low, allowing the attack to be executed rapidly and at scale.

Loss & Training

The generator training utilizes cross-entropy loss, the AdamW optimizer (learning rate 1e-4, weight decay 1e-4), and a batch size of 128. The alignment stage requires no training and directly uses the closed-form least-squares solution.

Key Experimental Results

Main Results

Method Victim Model Rouge-L BLEU1 COS
Vec2Text Base T5 17.38 21.47 0.4663
Vec2Text Corrector T5 15.81 18.35 0.4835
ALGEN (1k samples) T5 45.75 52.98 0.9464
ALGEN (1k samples) GTR 38.27 42.59 0.8879
ALGEN (1k samples) OpenAI ada-2 41.45 46.70 0.9312
ALGEN (1k samples) OpenAI 3-large 41.31 46.28 0.9066

Ablation Study

Leaked Samples T5 Rouge-L GTR Rouge-L OpenAI Rouge-L Description
1 ~10 ~8 ~10 Partially successful with a single sample
10 ~20 ~15 ~18 Rapid improvement
100 ~35 ~28 ~32 Approaching saturation
1000 45.75 38.27 41.45 Performance plateau
3000+ ~47 ~40 ~43 Improvement slows down

Key Findings

  • ALGEN outperforms Vec2Text (which requires millions of samples) by a large margin using only 1k samples, with Rouge-L improving from 17.38 to 45.75.
  • It is equally effective on closed-source OpenAI embeddings (Rouge-L 41+), demonstrating the feasibility of black-box attacks.
  • Cross-domain attacks (trained on MultiHPLT, attacked on mMarco) remain effective, achieving a Rouge-L of approximately 20.
  • Cross-lingual attacks are feasible; the attack model trained in English can invert French/German/Spanish embeddings.
  • Existing defenses (WET watermarking, Shuffling, Gaussian noise, Differential Privacy) fail to effectively mitigate ALGEN—only adding extreme noise can reduce attack performance, which simultaneously severely degrades the utility of embeddings in downstream tasks.

Highlights & Insights

  • Transferring embedding space alignment from NLP alignment tasks to security attack scenarios is an elegant paradigm. The closed-form solution of linear alignment means "translating" across models is completed with just one line of matrix multiplication, incurring almost zero computational cost.
  • Launching attacks with just 1 leaked sample drastically lowers the attack barrier, implying that even extremely minor data leakage in vector databases could lead to large-scale privacy risks.
  • Named entities (organization names, country names, etc.) were successfully reconstructed in experiments, demonstrating that the attack can leak genuinely sensitive information.

Limitations & Future Work

  • The upper bound of attack performance is restricted by the decoding ability of the local generator (Rouge-L upper bound is roughly 54); improving the generator may yield further gains.
  • The paper does not propose an effective defense scheme, and existing defenses have been proven ineffective.
  • The linear alignment assumption might fail when embedding spaces differ substantially (e.g., image embeddings vs. text embeddings).
  • Inversion of dialogue/long text embeddings is not considered; currently limited to the sentence level.
  • vs Vec2Text (Morris et al.): Vec2Text requires millions of training samples and iterative access to the victim encoder, whereas ALGEN needs only 1k samples and a one-time linear alignment, offering efficiency orders of magnitude higher.
  • vs Huang et al.: Uses adversarial training for embedding alignment but requires 8k samples and is non-differentiable; ALGEN's least-squares alignment is much cleaner.
  • vs Chen et al.: Extends multilingual inversion attacks; ALGEN also supports cross-lingual transfer but requires much less data.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ The few-shot attack paradigm is a significant advancement in the field of embedding security, and the one-step linear alignment is elegant and concise.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive evaluation involving 6 victim models, 4 languages, 3 datasets, and 4 defense methods.
  • Writing Quality: ⭐⭐⭐⭐ Clear paper structure with complete derivations.
  • Value: ⭐⭐⭐⭐⭐ Highlights severe privacy risks of embedding services, issuing an important warning for vector database security.