MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment¶

Conference: ACL 2025
arXiv: 2503.01711
Area: LLM NLP
Keywords: Personalized Search, Search Motivation, Consultation Alignment, Mixture of Attention Experts, E-commerce Search

TL;DR¶

This work is the first to model "search motivation"—the genuine user needs latent in pre-search consultation behaviors—and proposes the MAPS framework that integrates LLM semantics, MoAE pooling, and dual-alignment mechanisms, improving HR@10 by 24.4% (from 0.5685 to 0.7071) on real-world commercial data.

Background & Motivation¶

Core Finding: Analyzing real-world e-commerce platform data reveals that a significant proportion of users consult (e.g., with AI customer service) before searching, and these consultations implicitly contain the true motivations behind search queries.
Limitations of Prior Work:
- Traditional personalized search assumes that the query completely expresses the user's intent.
- In reality, a user searching for "X-600" might not be certain that this is the optimal choice, requiring multiple searches and comparisons.
- Search motivation is the genuine need that actually needs to be satisfied.
Key Challenge:
Query Alignment: Consultation long text vs. search query keywords leads to a massive gap in semantic space.
Item Feature Alignment: Items have categorical attributes, whereas consultations are in natural language, creating a modal gap.
User History Alignment: Not all consultation history is relevant to the current search, requiring noise filtering.

Method¶

Overall Architecture¶

MAPS consists of three core modules: 1. ID-Text Representation Fusion: Leverages LLM embeddings and MoAE pooling. 2. Mapping-based General Alignment: Aligns token-item relationships via contrastive learning. 3. Sequential Personalized Alignment: Fuses motivation awareness through bidirectional attention.

Key Designs¶

1. Mixture of Attention Experts (MoAE) Pooling Three types of attention experts, each with \(N_E\) members: - Parametric Attention Pooling: Maintains learnable query vectors and computes attention weights over input tokens. - Self-Attention Pooling: Performs token-to-token query-key attention weighting. - Search-Centric Cross-Attention Pooling: Uses the search query embedding as the attention query, directing other textual tokens to focus on search-relevant semantics. - A Top-K gating routing mechanism selects and activates experts, fusing them via weighted summation to obtain the text embedding.

2. Mapping-based General Alignment - Collects the item's multi-scenario text set (searches, consultations, titles, ads, reviews, etc.). - Noise texts are filtered based on frequency thresholds, establishing token-item mappings. - Bidirectional contrastive loss \(\mathcal{L}_{GA}\): - Direction 1: Given an item, pull closer correct tokens and push away incorrect tokens. - Direction 2: Given a token, pull closer correct items and push away incorrect items.

3. Sequential Personalized Alignment - Consultation Motivation Extraction: Uses the current query as an anchor, inputting it along with the consultation history into a Transformer Encoder. - The output at position 0 is taken as the consultation motivation embedding \(e^C\). - Search History Encoding: Process the search query history in the same manner to obtain \(e^S\). - Motivation-Aware Query Fusion: \(e' = \alpha_1 e^C + \alpha_2 e^S + \alpha_3 e_{query}\) (with learnable weights). - Final Ranking: Interacts the motivation-aware query with the item history via Encoder + user embedding \(\to\) \(p(v|s,H,u)\).

Total Loss: \(\mathcal{L} = \mathcal{L}_{PA} + \lambda_3 \mathcal{L}_{GA} + \lambda_4 ||\Theta||_2^2\)

Key Experimental Results¶

Main Results¶

Datasets: - Commercial: Real e-commerce platform, 2,096 users \(\times\) 2,691 items \(\times\) 24,662 search interactions. - Amazon: Amazon Reviews (PersonalWAB version), with simulated consultation text generated by GPT-4o.

Ranking Performance (Commercial Dataset):

Model	HR@5	HR@10	NDCG@5	NDCG@10
TEM	0.4041	0.5685	0.2871	0.3402
CoPPS	0.4050	0.5637	0.2831	0.3445
MAPS	0.5281	0.7071	0.3780	0.4359

HR@10 improved by 24.4% (0.5685 \(\to\) 0.7071), NDCG@10 improved by 28.1%.
All comparisons against baselines are significant with \(p < 0.05\).

Amazon Dataset:

Model	HR@5	HR@10	NDCG@5	NDCG@10
CoPPS	0.3870	0.4854	0.2788	0.3298
MAPS	0.5832	0.7735	0.4059	0.4676

HR@10 improved by 59.3%, NDCG@10 improved by 41.8%.

Retrieval Performance (Commercial, MRR@10):

Method	MRR@10
BM25	0.2529
BGE-M3	0.2976
CHIQ	0.3192
MAPS	0.3805

vs Multi-scenario Methods (Commercial):

Method	HR@10	NDCG@10
UniSAR	0.5838	0.3577
MAPS	0.7071	0.4359

Key Findings¶

Ablation Study (Commercial):

Configuration	HR@10	NDCG@10
MAPS (Full)	0.7071	0.4359
w/o LLM	0.6527(-7.7%)	0.3968
w/o MoAE	0.6781(-4.1%)	0.4096
w/o General Alignment	0.6198(-12.3%)	0.3669
w/o Personalized Alignment	0.6334(-10.4%)	0.3732

General alignment module contributes the most (removing it drops HR@10 by 12.3%), followed by personalized alignment (10.4%).
LLM embeddings and MoAE pooling contribute approximately 4–8% each.
Both search history motivation (\(e^S\)) and consultation motivation (\(e^C\)) make independent contributions to the overall performance.

Highlights & Insights¶

Problem Discovery: For the first time, this work reveals from a data-driven perspective that "pre-search consultation" encodes search motivation, opening up a new research direction.
MoAE Pooling: The Mixture of Attention Experts design elegantly resolves the differences in semantic emphasis across various text types.
Dual-Alignment Design: General alignment ensures the unification of the ID-text space, while personalized alignment captures user-specific motivations.
Real-World Commercial Validation: Evaluation spans both public datasets and real-world e-commerce platform data.
Consultation Simulation Strategy: Simulated consultation texts were generated using GPT-4o for the Amazon dataset, expanding the range of applicable scenarios.

Limitations & Future Work¶

Relies on the platform providing AI consultation services; platforms lacking consultation data cannot use it directly.
Amazon's consultation data is simulated by GPT-4o, which may exhibit distribution shifts compared to real conversations.
Freezing LLM embeddings may restrict domain adaptation capabilities.
The computational overhead of MoAE pooling during real-time inference is not discussed in detail.
Evaluated only on e-commerce search, without exploring other search scenarios (e.g., academic search, legal retrieval).

Personalized Search: HEM, AEM, QEM, ZAM, TEM, CoPPS
Multi-scenario Search: SESRec (Search + Recommendation Contrastive Learning), UniSAR (Transformer Cross-Attention)
Conversational Retrieval: CHIQ
Dense Retrieval: BGE-M3

Rating¶

Novelty: ⭐⭐⭐⭐⭐ — Modeling search motivation for the first time; highly pioneering problem formulation.
Technical Depth: ⭐⭐⭐⭐ — Complete system design with MoAE and dual-alignment.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensive evaluation combining real-world commercial data and public data, across ranking, retrieval, and ablation.
Value: ⭐⭐⭐⭐ — Directly applicable to e-commerce platforms with AI customer service.
Overall Rating: ⭐⭐⭐⭐ — Strongly problem-driven work, solid experiments, and high practical value.