🎁 Recommender Systems¶
💬 ACL2026 · 22 paper notes
📌 Same area in other venues: 🔬 ICLR2026 (24) · 🧪 ICML2026 (11) · 🤖 AAAI2026 (27) · 🧠 NeurIPS2025 (24)
🔥 Top topics: Recommendation ×12 · Personalized Generation ×5 · Dialogue ×4 · LLM ×3 · Reasoning ×3
- Bridging Language and Items for Retrieval and Recommendation: Benchmarking LLMs as Semantic Encoders
-
This paper introduces the Amazon Reviews 2023 large-scale dataset (570M reviews / 48M items) and constructs the BLaIR benchmark. Covering Sequential Recommendation, Collaborative Filtering, and Item Search (short and complex queries), the study benchmarks 11 top-tier LLMs as semantic encoders. It reveals that model rankings on BLaIR are almost uncorrelated with MTEB (Spearman -0.476), highlighting the unique requirements of recommendation scenarios for semantic encoders.
- ClusterRAG: Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation
-
ClusterRAG introduces collaborative filtering into personalized RAG by constructing user representations from historical documents and clustering them with HDBSCAN. It hierarchically retrieves profile documents from both the target user and similar users to compose prompts, enabling the hybrid mode to outperform vanillaRAG, LaMP-IPA, ROPG, and CFRAG across the LaMP multi-task benchmark.
- Culinary Crossroads: A RAG Framework for Enhancing Diversity in Cross-Cultural Recipe Adaptation
-
Authors observe that standard RAG "produces non-diverse outputs even when given diverse contexts" in creative tasks. They design CARRIAGE, a plug-and-play framework featuring query rewriting, diversity-aware MMR re-ranking, sliding-window dynamic context, and contrastive context injection. This framework effectively transfers "contextual diversity" to "output diversity," improving lexical/semantic/ingredient diversity and CultureScore in Spanish cross-national recipe adaptation, achieving Pareto efficiency compared to closed-book LLMs.
- Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents
-
The DECISIVE interactive decision-making framework is proposed to extract objective option scoring matrices from unstructured documents. By combining Bayesian preference inference with adaptive pairwise comparison questions, the system efficiently learns the user's latent preference vector. This achieves transparent personalized recommendations while minimizing interaction burden, improving decision accuracy by up to 20% over strong baselines.
- From Past To Path: Masked History Learning for Next-Item Prediction in Generative Recommendation
-
Proposes the Masked History Learning (MHL) training framework, which incorporates a masked history reconstruction auxiliary task into the autoregressive training of generative recommendation. Combined with an entropy-guided adaptive masking strategy and a curriculum learning scheduler, it shifts the model from merely predicting "what is next" to understanding "why this path was formed," significantly outperforming SOTA on three datasets.
- From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents
-
This paper proposes the Memora benchmark and the FAMA metric, extending long-term memory evaluation from shallow factual retrieval to memory consolidation and mutation handling across weeks to months, revealing systemic failures of existing LLMs and memory agents in handling frequent knowledge updates.
- GraphLoRA: Structure-Aware Low-Rank Adaptation for Large Language Model Recommendation
-
Existing LLM recommenders either feed collaborative information into prompts or inject pre-trained static embeddings into LoRA weights, treating structure as a "one-read" static input. GraphLoRA embeds a trainable graph message passing network into the LoRA bottleneck (between down-projection \(\mathbf{A}\) and up-projection \(\mathbf{B}\)), allowing collaborative topology to propagate dynamically within the parameter space and directly guide weight updates. With only ~1.67% additional parameters, it outperforms SOTAs like CoRA on ML-1M and Amazon-Book.
- HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation
-
Proposes the HARPO framework, which redefines conversational recommendation as a structured decision-making problem optimized for recommendation quality. Through four components—hierarchical preference learning, value-network-guided tree search reasoning, virtual tool operations, and multi-agent refinement—it significantly outperforms existing methods on the ReDial, INSPIRED, and MUSE benchmarks.
- HORIZON: A Benchmark for in-the-wild User Behaviour Modeling
-
This paper proposes HORIZON, the first fully open-source large-scale cross-domain long-term recommendation benchmark. Based on merged Amazon Reviews, it constructs a unified interaction history containing 54M users and 35M items. It designs a four-quadrant evaluation protocol decoupled along the time axis and user dimension, revealing that models like BERT4Rec perform strongly in-distribution but significantly degrade in temporal extrapolation and unseen user scenarios. Furthermore, LLMs do not consistently outperform specialized architectures in user behavior modeling.
- HSUGA: LLM-Enhanced Recommendation with Hierarchical Semantic Understanding and Group-Aware Alignment
-
HSUGA decouples and enhances the two core stages of LLM-enhanced sequential recommendation. It adopts the HSU module, which uses "staged processing + four atomic edits (Add/Delete/Update/Retain)," to stabilize semantic extraction from long sequences. It also introduces GAA self-distillation alignment, which groups users by activity (top 20% active / 80% long-tail) to address under-supervision for long-tail users and over-alignment for active users. As a plug-and-play solution, it yields performance gains across Steam/Fashion/Beauty datasets using GRU4Rec/BERT4Rec/SASRec backbones.
- IceBreaker for Conversational Agents: Breaking the First-Message Barrier with Personalized Starters
-
This paper proposes IceBreaker, which addresses the "first-message barrier" for conversational agents through a two-step "handshake"—Resonance-aware Interest Distillation to capture trigger interests and Interaction-oriented Starter Generation coupled with Personalized Preference Alignment. In A/B testing on one of the world's largest conversational products, it increased active user days by +1.84‰ and click-through rate (CTR) by +94.25‰.
- Intent-Driven Semantic ID Generation for Grounded Conversational News Recommendation
-
This paper proposes NewsRec-Chat, which inverts conversational news recommendation from a "retrieve-then-generate" paradigm to "generate SID then fuzzy match." By utilizing two-stage SID alignment and GPT-4 CoT distillation, a 7B model directly generates hierarchical Semantic ID prefixes and performs fuzzy matching against the daily news pool. It achieves an L1 of 12.4% (4× random) in a 152K open generation space on the Tencent News platform with 0% hallucinations, while its Profile-Aware Dual-Signal Reasoning enables cold-start users (zero history) to reach 18.0% L1 (where other baselines fail).
- Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction
-
This paper proposes the ReCAP framework, which significantly improves personalized persuasiveness prediction by utilizing a trainable query generator and user profiler to retrieve persuasion-relevant information from user history and construct context-aware user profiles.
- MemRec: Collaborative Memory-Augmented Agentic Recommender System
-
MemRec employs a lightweight LLM to specifically manage a dynamic "Collaborative Memory Graph" (connecting semantic memories of multiple users and items via interaction edges), and feeds distilled "collaborative facets" to a heavy-duty reasoning LLM for final recommendation. By utilizing a "Curate-then-Synthesize" denoising strategy and asynchronous \(O(1)\) label propagation updates, it achieves a relative H@1 improvement of +15% to +29% over the SOTA i2Agent across four benchmarks, with a significant +91.4% gain over Vanilla LLMs for sparse users.
- Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation
-
The authors rewrite "user feedback logs" in recommendation systems into a unified simulation scenario of "User Memory + Exposure List" understandable by LLMs. They then generate explicit chain-of-thought decision processes as "clarifications" using the EKB consumer decision model. Through uncertainty decomposition and rejection sampling, 10K high-quality SFT/DPO data points are distilled, allowing a 3B Llama user simulator to outperform GPT-5 and Gemini-2.5-Flash in predicting real user behavior across 8 domains.
- Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework
-
This paper proposes C-BPO, which treats target user history as positive feedback and other users' history as noisy unlabeled negative feedback. It utilizes PU learning to correct the mis-penalization caused by "preference overlap," allowing the LLM to learn unique user preferences without suppressing general task capabilities.
- Quality Over Clicks: Intrinsic Quality-Driven Iterative RL for Cold-Start E-Commerce Query Suggestion
-
Ours propose Cold-EQS, a query suggestion framework for cold-start e-commerce scenarios. It utilizes answerability, factual accuracy, and information gain as intrinsic quality rewards to continuously optimize query suggestion quality through iterative reinforcement learning, achieving a 6.81% Gain in online chatUV.
- ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning
-
This paper proposes ReRec, a Reinforcement Fine-tuning (RFT) framework for recommendation assistants. It provides fine-grained reward signals through dual-graph augmented reward shaping, differentiated supervision of reasoning steps via Reasoning-Aware Advantage Estimation (RAAE), and dynamic adjustment of training difficulty via an online curriculum scheduler. ReRec enables LLMs to handle complex multi-step reasoning recommendation queries, significantly outperforming existing methods on the RecBench+ benchmark.
- SenseJudge: Human-Centric Preference-Driven Judgment Framework
-
This paper proposes SenseJudge, a customizable LLM judgment framework based on explicit human preferences, along with SenseBench, a real-world multi-turn conversation benchmark. In personalized judgment tasks, the framework achieves an average accuracy 16.08% higher than baselines, with model rankings consistent with real human rankings.
- What Makes an Ideal Quote? Recommending "Unexpected yet Rational" Quotations via Novelty
-
NOVELQR proposes a novelty-driven quote recommendation framework that constructs a deep semantic knowledge base via generative label proxies for rational retrieval, and utilizes a token-level novelty estimator to mitigate auto-regressive completion bias, significantly enhancing recommendation quality across bilingual benchmarks.
- What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context
-
This paper reveals that existing LLM-based recommendation systems lose critical information—preference intensity and temporal context—due to binary preference modeling. It proposes the RecPO framework, which incorporates these two factors into preference optimization through an adaptive reward margin, significantly outperforming baselines like S-DPO across five datasets.
- Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation
-
SiPeR addresses the challenges of dynamic user preferences and implicit expressions in situated conversational recommendation through Scene Transition Estimation ("Where") and Bayesian Inverse Inference ("What"), achieving performance gains of 10.9% and 10.6% on SIMMC 2.1 and SCREEN, respectively.