MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment¶
Conference: ACL 2025
arXiv: 2503.01711
Area: LLM NLP
Keywords: Personalized Search, Search Motivation, Consultation Alignment, Mixture of Attention Experts, E-commerce Search
TL;DR¶
This work is the first to model "search motivation"—the genuine user needs latent in pre-search consultation behaviors—and proposes the MAPS framework that integrates LLM semantics, MoAE pooling, and dual-alignment mechanisms, improving HR@10 by 24.4% (from 0.5685 to 0.7071) on real-world commercial data.
Background & Motivation¶
- Core Finding: Analyzing real-world e-commerce platform data reveals that a significant proportion of users consult (e.g., with AI customer service) before searching, and these consultations implicitly contain the true motivations behind search queries.
- Limitations of Prior Work:
- Traditional personalized search assumes that the query completely expresses the user's intent.
- In reality, a user searching for "X-600" might not be certain that this is the optimal choice, requiring multiple searches and comparisons.
- Search motivation is the genuine need that actually needs to be satisfied.
- Key Challenge:
- Query Alignment: Consultation long text vs. search query keywords leads to a massive gap in semantic space.
- Item Feature Alignment: Items have categorical attributes, whereas consultations are in natural language, creating a modal gap.
- User History Alignment: Not all consultation history is relevant to the current search, requiring noise filtering.
Method¶
Overall Architecture¶
MAPS consists of three core modules: 1. ID-Text Representation Fusion: Leverages LLM embeddings and MoAE pooling. 2. Mapping-based General Alignment: Aligns token-item relationships via contrastive learning. 3. Sequential Personalized Alignment: Fuses motivation awareness through bidirectional attention.
Key Designs¶
1. Mixture of Attention Experts (MoAE) Pooling Three types of attention experts, each with \(N_E\) members: - Parametric Attention Pooling: Maintains learnable query vectors and computes attention weights over input tokens. - Self-Attention Pooling: Performs token-to-token query-key attention weighting. - Search-Centric Cross-Attention Pooling: Uses the search query embedding as the attention query, directing other textual tokens to focus on search-relevant semantics. - A Top-K gating routing mechanism selects and activates experts, fusing them via weighted summation to obtain the text embedding.
2. Mapping-based General Alignment - Collects the item's multi-scenario text set (searches, consultations, titles, ads, reviews, etc.). - Noise texts are filtered based on frequency thresholds, establishing token-item mappings. - Bidirectional contrastive loss \(\mathcal{L}_{GA}\): - Direction 1: Given an item, pull closer correct tokens and push away incorrect tokens. - Direction 2: Given a token, pull closer correct items and push away incorrect items.
3. Sequential Personalized Alignment - Consultation Motivation Extraction: Uses the current query as an anchor, inputting it along with the consultation history into a Transformer Encoder. - The output at position 0 is taken as the consultation motivation embedding \(e^C\). - Search History Encoding: Process the search query history in the same manner to obtain \(e^S\). - Motivation-Aware Query Fusion: \(e' = \alpha_1 e^C + \alpha_2 e^S + \alpha_3 e_{query}\) (with learnable weights). - Final Ranking: Interacts the motivation-aware query with the item history via Encoder + user embedding \(\to\) \(p(v|s,H,u)\).
Total Loss: \(\mathcal{L} = \mathcal{L}_{PA} + \lambda_3 \mathcal{L}_{GA} + \lambda_4 ||\Theta||_2^2\)
Key Experimental Results¶
Main Results¶
Datasets: - Commercial: Real e-commerce platform, 2,096 users \(\times\) 2,691 items \(\times\) 24,662 search interactions. - Amazon: Amazon Reviews (PersonalWAB version), with simulated consultation text generated by GPT-4o.
Ranking Performance (Commercial Dataset):
| Model | HR@5 | HR@10 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|
| TEM | 0.4041 | 0.5685 | 0.2871 | 0.3402 |
| CoPPS | 0.4050 | 0.5637 | 0.2831 | 0.3445 |
| MAPS | 0.5281 | 0.7071 | 0.3780 | 0.4359 |
- HR@10 improved by 24.4% (0.5685 \(\to\) 0.7071), NDCG@10 improved by 28.1%.
- All comparisons against baselines are significant with \(p < 0.05\).
Amazon Dataset:
| Model | HR@5 | HR@10 | NDCG@5 | NDCG@10 |
|---|---|---|---|---|
| CoPPS | 0.3870 | 0.4854 | 0.2788 | 0.3298 |
| MAPS | 0.5832 | 0.7735 | 0.4059 | 0.4676 |
- HR@10 improved by 59.3%, NDCG@10 improved by 41.8%.
Retrieval Performance (Commercial, MRR@10):
| Method | MRR@10 |
|---|---|
| BM25 | 0.2529 |
| BGE-M3 | 0.2976 |
| CHIQ | 0.3192 |
| MAPS | 0.3805 |
vs Multi-scenario Methods (Commercial):
| Method | HR@10 | NDCG@10 |
|---|---|---|
| UniSAR | 0.5838 | 0.3577 |
| MAPS | 0.7071 | 0.4359 |
Key Findings¶
Ablation Study (Commercial):
| Configuration | HR@10 | NDCG@10 |
|---|---|---|
| MAPS (Full) | 0.7071 | 0.4359 |
| w/o LLM | 0.6527(-7.7%) | 0.3968 |
| w/o MoAE | 0.6781(-4.1%) | 0.4096 |
| w/o General Alignment | 0.6198(-12.3%) | 0.3669 |
| w/o Personalized Alignment | 0.6334(-10.4%) | 0.3732 |
- General alignment module contributes the most (removing it drops HR@10 by 12.3%), followed by personalized alignment (10.4%).
- LLM embeddings and MoAE pooling contribute approximately 4–8% each.
- Both search history motivation (\(e^S\)) and consultation motivation (\(e^C\)) make independent contributions to the overall performance.
Highlights & Insights¶
- Problem Discovery: For the first time, this work reveals from a data-driven perspective that "pre-search consultation" encodes search motivation, opening up a new research direction.
- MoAE Pooling: The Mixture of Attention Experts design elegantly resolves the differences in semantic emphasis across various text types.
- Dual-Alignment Design: General alignment ensures the unification of the ID-text space, while personalized alignment captures user-specific motivations.
- Real-World Commercial Validation: Evaluation spans both public datasets and real-world e-commerce platform data.
- Consultation Simulation Strategy: Simulated consultation texts were generated using GPT-4o for the Amazon dataset, expanding the range of applicable scenarios.
Limitations & Future Work¶
- Relies on the platform providing AI consultation services; platforms lacking consultation data cannot use it directly.
- Amazon's consultation data is simulated by GPT-4o, which may exhibit distribution shifts compared to real conversations.
- Freezing LLM embeddings may restrict domain adaptation capabilities.
- The computational overhead of MoAE pooling during real-time inference is not discussed in detail.
- Evaluated only on e-commerce search, without exploring other search scenarios (e.g., academic search, legal retrieval).
Related Work & Insights¶
- Personalized Search: HEM, AEM, QEM, ZAM, TEM, CoPPS
- Multi-scenario Search: SESRec (Search + Recommendation Contrastive Learning), UniSAR (Transformer Cross-Attention)
- Conversational Retrieval: CHIQ
- Dense Retrieval: BGE-M3
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — Modeling search motivation for the first time; highly pioneering problem formulation.
- Technical Depth: ⭐⭐⭐⭐ — Complete system design with MoAE and dual-alignment.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensive evaluation combining real-world commercial data and public data, across ranking, retrieval, and ablation.
- Value: ⭐⭐⭐⭐ — Directly applicable to e-commerce platforms with AI customer service.
- Overall Rating: ⭐⭐⭐⭐ — Strongly problem-driven work, solid experiments, and high practical value.