Skip to content

MAPS: Motivation-Aware Personalized Search via LLM-Driven Consultation Alignment

Conference: ACL 2025
arXiv: 2503.01711
Area: LLM NLP
Keywords: Personalized Search, Search Motivation, Consultation Alignment, Mixture of Attention Experts, E-commerce Search

TL;DR

This work is the first to model "search motivation"—the genuine user needs latent in pre-search consultation behaviors—and proposes the MAPS framework that integrates LLM semantics, MoAE pooling, and dual-alignment mechanisms, improving HR@10 by 24.4% (from 0.5685 to 0.7071) on real-world commercial data.

Background & Motivation

  • Core Finding: Analyzing real-world e-commerce platform data reveals that a significant proportion of users consult (e.g., with AI customer service) before searching, and these consultations implicitly contain the true motivations behind search queries.
  • Limitations of Prior Work:
    • Traditional personalized search assumes that the query completely expresses the user's intent.
    • In reality, a user searching for "X-600" might not be certain that this is the optimal choice, requiring multiple searches and comparisons.
    • Search motivation is the genuine need that actually needs to be satisfied.
  • Key Challenge:
  • Query Alignment: Consultation long text vs. search query keywords leads to a massive gap in semantic space.
  • Item Feature Alignment: Items have categorical attributes, whereas consultations are in natural language, creating a modal gap.
  • User History Alignment: Not all consultation history is relevant to the current search, requiring noise filtering.

Method

Overall Architecture

MAPS consists of three core modules: 1. ID-Text Representation Fusion: Leverages LLM embeddings and MoAE pooling. 2. Mapping-based General Alignment: Aligns token-item relationships via contrastive learning. 3. Sequential Personalized Alignment: Fuses motivation awareness through bidirectional attention.

Key Designs

1. Mixture of Attention Experts (MoAE) Pooling Three types of attention experts, each with \(N_E\) members: - Parametric Attention Pooling: Maintains learnable query vectors and computes attention weights over input tokens. - Self-Attention Pooling: Performs token-to-token query-key attention weighting. - Search-Centric Cross-Attention Pooling: Uses the search query embedding as the attention query, directing other textual tokens to focus on search-relevant semantics. - A Top-K gating routing mechanism selects and activates experts, fusing them via weighted summation to obtain the text embedding.

2. Mapping-based General Alignment - Collects the item's multi-scenario text set (searches, consultations, titles, ads, reviews, etc.). - Noise texts are filtered based on frequency thresholds, establishing token-item mappings. - Bidirectional contrastive loss \(\mathcal{L}_{GA}\): - Direction 1: Given an item, pull closer correct tokens and push away incorrect tokens. - Direction 2: Given a token, pull closer correct items and push away incorrect items.

3. Sequential Personalized Alignment - Consultation Motivation Extraction: Uses the current query as an anchor, inputting it along with the consultation history into a Transformer Encoder. - The output at position 0 is taken as the consultation motivation embedding \(e^C\). - Search History Encoding: Process the search query history in the same manner to obtain \(e^S\). - Motivation-Aware Query Fusion: \(e' = \alpha_1 e^C + \alpha_2 e^S + \alpha_3 e_{query}\) (with learnable weights). - Final Ranking: Interacts the motivation-aware query with the item history via Encoder + user embedding \(\to\) \(p(v|s,H,u)\).

Total Loss: \(\mathcal{L} = \mathcal{L}_{PA} + \lambda_3 \mathcal{L}_{GA} + \lambda_4 ||\Theta||_2^2\)

Key Experimental Results

Main Results

Datasets: - Commercial: Real e-commerce platform, 2,096 users \(\times\) 2,691 items \(\times\) 24,662 search interactions. - Amazon: Amazon Reviews (PersonalWAB version), with simulated consultation text generated by GPT-4o.

Ranking Performance (Commercial Dataset):

Model HR@5 HR@10 NDCG@5 NDCG@10
TEM 0.4041 0.5685 0.2871 0.3402
CoPPS 0.4050 0.5637 0.2831 0.3445
MAPS 0.5281 0.7071 0.3780 0.4359
  • HR@10 improved by 24.4% (0.5685 \(\to\) 0.7071), NDCG@10 improved by 28.1%.
  • All comparisons against baselines are significant with \(p < 0.05\).

Amazon Dataset:

Model HR@5 HR@10 NDCG@5 NDCG@10
CoPPS 0.3870 0.4854 0.2788 0.3298
MAPS 0.5832 0.7735 0.4059 0.4676
  • HR@10 improved by 59.3%, NDCG@10 improved by 41.8%.

Retrieval Performance (Commercial, MRR@10):

Method MRR@10
BM25 0.2529
BGE-M3 0.2976
CHIQ 0.3192
MAPS 0.3805

vs Multi-scenario Methods (Commercial):

Method HR@10 NDCG@10
UniSAR 0.5838 0.3577
MAPS 0.7071 0.4359

Key Findings

Ablation Study (Commercial):

Configuration HR@10 NDCG@10
MAPS (Full) 0.7071 0.4359
w/o LLM 0.6527(-7.7%) 0.3968
w/o MoAE 0.6781(-4.1%) 0.4096
w/o General Alignment 0.6198(-12.3%) 0.3669
w/o Personalized Alignment 0.6334(-10.4%) 0.3732
  • General alignment module contributes the most (removing it drops HR@10 by 12.3%), followed by personalized alignment (10.4%).
  • LLM embeddings and MoAE pooling contribute approximately 4–8% each.
  • Both search history motivation (\(e^S\)) and consultation motivation (\(e^C\)) make independent contributions to the overall performance.

Highlights & Insights

  1. Problem Discovery: For the first time, this work reveals from a data-driven perspective that "pre-search consultation" encodes search motivation, opening up a new research direction.
  2. MoAE Pooling: The Mixture of Attention Experts design elegantly resolves the differences in semantic emphasis across various text types.
  3. Dual-Alignment Design: General alignment ensures the unification of the ID-text space, while personalized alignment captures user-specific motivations.
  4. Real-World Commercial Validation: Evaluation spans both public datasets and real-world e-commerce platform data.
  5. Consultation Simulation Strategy: Simulated consultation texts were generated using GPT-4o for the Amazon dataset, expanding the range of applicable scenarios.

Limitations & Future Work

  • Relies on the platform providing AI consultation services; platforms lacking consultation data cannot use it directly.
  • Amazon's consultation data is simulated by GPT-4o, which may exhibit distribution shifts compared to real conversations.
  • Freezing LLM embeddings may restrict domain adaptation capabilities.
  • The computational overhead of MoAE pooling during real-time inference is not discussed in detail.
  • Evaluated only on e-commerce search, without exploring other search scenarios (e.g., academic search, legal retrieval).
  • Personalized Search: HEM, AEM, QEM, ZAM, TEM, CoPPS
  • Multi-scenario Search: SESRec (Search + Recommendation Contrastive Learning), UniSAR (Transformer Cross-Attention)
  • Conversational Retrieval: CHIQ
  • Dense Retrieval: BGE-M3

Rating

  • Novelty: ⭐⭐⭐⭐⭐ — Modeling search motivation for the first time; highly pioneering problem formulation.
  • Technical Depth: ⭐⭐⭐⭐ — Complete system design with MoAE and dual-alignment.
  • Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensive evaluation combining real-world commercial data and public data, across ranking, retrieval, and ablation.
  • Value: ⭐⭐⭐⭐ — Directly applicable to e-commerce platforms with AI customer service.
  • Overall Rating: ⭐⭐⭐⭐ — Strongly problem-driven work, solid experiments, and high practical value.