CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems¶
Conference: ACL 2025
arXiv: 2506.19993
Code: GitHub
Area: Recommender Systems
Keywords: LLM-based Recommender Systems, Vocabulary Expansion, Embedding Compression, Sequential Recommendation, Hashed Compression
TL;DR¶
The CoVE framework is proposed to expand the LLM vocabulary by assigning a unique token ID and embedding to each item, which converts sequential recommendation into a next-token prediction task. Compared to existing methods, CoVE improves recommendation accuracy by up to 62% and achieves an approximate 100x speedup in inference, while addressing memory constraints in large-scale scenarios via hashed embedding compression.
Background & Motivation¶
Background: Large Language Models (LLMs) are increasingly applied in recommender systems, primarily through two paradigms: (a) utilizing LLMs to provide embedding initialization for non-LLM recommender models; (b) fine-tuning LLMs to directly generate target item titles, which are then mapped to real items through embedding retrieval (e.g., BIGRec).
Limitations of Prior Work: - Paradigm (a) only exploits the embedding capability of LLMs without leveraging their content comprehension capabilities. - Paradigm (b), namely the finetune-and-retrieval framework, suffers from three key issues: LLMs must accurately predict multi-token item titles (difficult), generated titles may not exist in the item space (hallucination problem), and text generation inference is slow.
Key Challenge: LLMs possess powerful next-token prediction capabilities, but existing recommendation frameworks fail to exploit this capability directly, instead requiring LLMs to perform the more challenging task of multi-token title generation.
Goal: To design a framework that allows LLMs to directly utilize next-token prediction for recommendation while addressing the memory efficiency issues of embedding tables in large-scale item spaces.
Key Insight: Drawing inspiration from vocabulary expansion techniques in domain adaptation, unique tokens are assigned to each item to transform recommendation into a single-token prediction task.
Core Idea: Expand the LLM's vocabulary so that each item corresponds to a unique token, directly recommend using next-token prediction logits, and solve the embedding table memory bottleneck with hash compression.
Method¶
Overall Architecture¶
The core workflow of CoVE consists of:
1. Vocabulary Expansion: Add a unique token (e.g., <|205|>) for each item in the item space \(\mathcal{I}\) to the LLM's tokenizer.
2. Embedding Table Expansion: Map each item token to an independent, trainable embedding vector.
3. Fine-Tuning: Simultaneously train the item embedding table, LoRA adapter, and lm_head to align the LLM with the recommendation task.
4. Inference: Given a user's historical interaction sequence, extract the scores of the dimensions corresponding to item IDs in the logits for ranking and recommendation, entirely avoiding text generation.
Key Designs¶
1. Fine-tuning Task Design¶
- Function: Model the recommendation task as standard next-token prediction.
- Mechanism: Training samples contain task instructions, user history (task input, containing item IDs and titles), and the target item (task output). During training, the next-token prediction loss is minimized; during inference, only the scores corresponding to the last \(|\mathcal{I}|\) dimensions of the logits output by the lm_head are needed.
- Design Motivation: Simplify multi-token title generation into single-token ID prediction, eliminating hallucinations and significantly accelerating inference.
2. Hashed Embedding Compression¶
- Function: Compress the item embedding table from \(|\mathcal{I}|\) to \(|\mathcal{S}|\) (where \(|\mathcal{S}| \ll |\mathcal{I}|\)).
- Mechanism: Define \(k\) universal hash functions \(h_1, \ldots, h_k\), each mapping items to a shared embedding space. The embedding of item \(i\) is obtained by averaging its hash-mapped shared embeddings:
The hash functions leverage simple arithmetic operations: \(h(i) = ((ai + b) \bmod p) \bmod |\mathcal{S}|\)
- Design Motivation: In large-scale scenarios (e.g., the Amazon dataset containing 48.19 million items), directly storing the embedding table requires approximately 96GB of GPU memory. Hashed compression makes training feasible.
Loss & Training¶
- Loss Function: Standard next-token prediction loss (cross-entropy).
- Training Configuration:
- Beauty/Toys/Sports datasets: LLaMA-3.2-3B, learning rate \(10^{-4}\), batch size 32, LoRA rank 8, alpha 16, up to 10 epochs.
- Video Games dataset: LLaMA-2-7B + 4-bit QLoRA.
- Trainable Parameters: Item embedding table, LoRA adapter, lm_head.
Key Experimental Results¶
Main Results¶
On three Amazon datasets (Beauty/Toys/Sports) with a compression ratio of 2, CoVE vs. the best baseline (TIGER):
| Dataset | Metric | TIGER | CoVE | Gain |
|---|---|---|---|---|
| Beauty | NG@5 | 0.0321 | 0.0498 | +55% |
| Beauty | HR@10 | 0.0648 | 0.1009 | +56% |
| Toys | NG@5 | 0.0371 | 0.0509 | +37% |
| Toys | HR@5 | 0.0521 | 0.0719 | +38% |
| Sports | NG@5 | 0.0204 | 0.0296 | +45% |
| Sports | HR@10 | 0.0400 | 0.0624 | +56% |
CoVE vs. BIGRec (finetune-and-retrieval) on the Video Games dataset:
| Metric | BIGRec | CoVE | Gain |
|---|---|---|---|
| NG@5 | 0.0189 | 0.0221 | +17% |
| HR@10 | 0.0329 | 0.0437 | +33% |
| HR@20 | 0.0457 | 0.0621 | +36% |
Inference speed: CoVE runs at 6.5 samples/s compared to BIGRec's 0.066 samples/s, achieving an approximate 100x speedup.
Ablation Study¶
Importance of item titles and embedding table training (Beauty dataset):
| Setting | NG@5 | HR@5 |
|---|---|---|
| Trainable Embeddings Only (No Titles) | 0.045 | 0.0622 |
| Title Information Only (Frozen Embeddings) | 0.0057 | 0.0094 |
| CoVE (Both Combined) | 0.0498 | 0.0714 |
Robustness of embedding compression: Under a 16x compression ratio, CoVE still outperforms the SOTA baseline (TIGER) on HR@5 and NG@5, with the sole exception of HR@10 on the Toys dataset.
Key Findings¶
- CoVE consistently outperforms all baselines across four datasets, with improvements of 30%-62% in NG and HR metrics.
- The fine-tuned LLM successfully learns the mapping between item IDs and titles, which is crucial for high-quality recommendation.
- Freezing the embedding table causes a drastic decline in performance, indicating that learning high-quality item embeddings is critical.
- The robustness of embedding compression varies across datasets; Sports and Toys remain stable under 8x compression, while Beauty is more sensitive.
Highlights & Insights¶
- Elegant Problem Transformation: Converts recommendation from "generating item titles" to "predicting item ID tokens", resolving hallucination, speed, and accuracy issues simultaneously.
- Balance between Theory and Practice: Hashed embedding compression makes the framework viable for large-scale industrial scenarios (reducing GPU memory overhead from 96GB in a 48M-item setup).
- Thorough Experiments: Evaluated across 4 datasets, with 12+ baselines compared, multiple ablation studies, inference speed analyses, and case studies, maintaining highly solid evidence.
- Insightful Case Study: Shows that the fine-tuned LLM can automatically output correct ID-title correspondences during generation, proving that CoVE indeed enables the LLM to learn item semantics.
Limitations & Future Work¶
- Embedding compression was only explored using hash methods; more advanced compression techniques (quantization, low-rank approximation) warrant future investigation.
- Experiments were restricted to Amazon e-commerce datasets; validation on other domains (news, video, music) is lacking.
- The cold-start problem (how to rapidly obtain high-quality embeddings for new items) remains undiscoused.
- Sensitivity to compression ratios varies by dataset, and adaptive compression strategies are currently absent.
Related Work & Insights¶
- BIGRec (Bao et al., 2023): A representative of the finetune-and-retrieval framework and the primary comparison target for CoVE.
- TIGER (Rajput et al., 2023): A SOTA method among non-LLM baselines, which CoVE significantly outperforms.
- ALPT (Li et al., 2023b): Adaptive low-precision training, which can potentially be integrated into CoVE's embedding compression in the future.
- Vocabulary expansion for domain adaptation (Cui et al., 2023; Liu et al., 2024a): The inspiration source for CoVE, extending vocabulary expansion from language adaptation to recommendation scenarios.
Rating¶
- Novelty: ⭐⭐⭐⭐ — Applying vocabulary expansion to recommender systems introduces a novel perspective with an elegant problem transformation.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ — 4 datasets, 12+ baselines, multi-dimensional ablation studies, and inference speed analysis yield an exceptionally solid evaluation.
- Writing Quality: ⭐⭐⭐⭐ — Features a clear structure, well-articulated motivation, and well-designed figures/tables.
- Value: ⭐⭐⭐⭐ — High practical value for industry deployment given the 100x inference acceleration and substantial accuracy gains.