Skip to content

🎁 Recommender Systems

🔬 ICLR2026 · 24 paper notes

📌 Same area in other venues: 💬 ACL2026 (22) · 🧪 ICML2026 (11) · 🤖 AAAI2026 (27) · 🧠 NeurIPS2025 (24)

🔥 Top topics: Recommendation ×12 · LLM ×10 · Personalized Generation ×4 · Diffusion Models ×3 · Reasoning ×3

Adaptive Regularization for Large-Scale Sparse Feature Embedding Models

This paper theoretically explains the root cause of "one-epoch overfitting" in CTR/CVR models—where performance collapses after the first epoch—using Rademacher complexity. It identifies that the unconstrained growth of embedding norms expands the generalization bound. Consequently, the authors propose AdamAR, an adaptive regularization method that allocates norm budgets based on feature frequency: applying light regularization for high-frequency features and heavy regularization for low-frequency ones. This approach eliminates multi-epoch overfitting while improving single-epoch performance and has been deployed in Alibaba's search advertising system.

Beyond Markovian Drifts: Action-Biased Geometric Walks with Memory for Personalized Summarization

This paper proposes the "Structured Walk Hypothesis" (SWH) to challenge the prevailing "Markovian Drift Hypothesis" (MDH) in personalized summarization. It introduces Walk2Pers, a lightweight encoder-decoder model that characterizes user preference evolution as an action-biased geometric walk with dual memory channels, decomposable into magnitude and orientation (continuity vs. novelty). It significantly outperforms specialized summarizers and Large Language Models (LLMs) across three benchmarks.

Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation

Addressing the issue where shoving item-IDs into an LLM causes collaborative signals and linguistic semantics to conflict, this paper proposes IDIOMoE: splitting the FFN of each pre-trained LLM block into a text expert and an item expert. Using static token-type gating to route tokens based on their type (item-id tokens go to the item expert, others to the text expert), the model decouples "collaborative filtering" and "semantic understanding" into different subnetworks. This achieves state-of-the-art recommendation performance on both public and industrial-scale datasets while maintaining the original LLM's linguistic capabilities.

CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation

Observing that KV caches of different users in sequential recommendation exhibit significant cross-user similarity (collaborative signals), CollectiveKV is proposed to decompose KV into low-dimensional user-specific parts and high-dimensional shared parts retrieved from a global KV pool, achieving a 0.8% compression rate without performance degradation.

Continual Low-Rank Adapters for LLM-based Generative Recommender Systems

PESO transforms continual learning for LLM-based generative recommendations from "stacking multiple frozen adapters" into "a single evolving LoRA + a proximal regularization term." By gently anchoring each update to the previous stage's state, the model automatically balances retaining long-term preferences and absorbing new ones, consistently outperforming cumulative LoRA and simple evolving LoRA across three real-world datasets.

Discrete Diffusion for Bundle Construction

DDBC reformulates "Bundle Construction" (selecting a group of items from a large library to form a complete bundle or completing a partial one) as a masked discrete diffusion process. It employs Residual Vector Quantization (RVQ) to compress each item into discrete codes within a shared codebook to mitigate the dimensionality explosion of massive item libraries. A bidirectional Transformer then restores [MASK] tokens into a complete bundle in an order-independent manner, achieving a relative improvement of over 100% on long-bundle datasets compared to the strongest baselines.

From Evaluation to Defense: Advancing Safety in Video Large Language Models

Constructed VideoSafetyEval (11.4k video-query pairs covering 19 risk categories) to reveal that the video modality causes a 34.2% decline in safety performance, and proposed the VideoSafety-R1 three-stage framework (Alarm Token + SFT + Safety-guided GRPO) which increases defense success rate by 71.1% on VSE-HH.

GoalRank: Group-Relative Optimization for a Large Ranking Model

It is theoretically proven that any Multi-Generator-Evaluator ranking system can be approximated with smaller error by a larger generator-only model that satisfies the scaling law. Accordingly, GoalRank is proposed—training a large generator-only ranking model by constructing group-relative reference policies with a reward model. It significantly outperforms SOTA in online A/B tests.

iFusion: Integrating Dynamic Interest Streams via Diffusion Model for Click-Through Rate Prediction

iFusion reformulates "long-short term user interest fusion" as a conditional generation problem—utilizing short-term interests as guidance to perform diffusion denoising on long-term interest representations. This approach bypasses the assumptions of traditional linear fusion (concatenation/attention/gating), achieving CTR improvements across public datasets, industrial datasets, and online A/B tests.

In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations

Through large-scale controlled experiments on 12 LLMs from 6 providers across three domains (news, academia, and e-commerce), this study reveals that LLMs possess systematic latent source preferences. When content semantics are identical, simply changing the source labels can significantly alter the model's information selection behavior, and these preferences cannot be eliminated through prompt engineering.

Low-pass Personalized Subgraph Federated Recommendation

Addressing the issues of representation misalignment and popularity bias caused by immense differences in "subgraph size and connectivity" among clients in federated recommendation, LPSFed utilizes low-pass spectral filtering to extract structural signals that remain stable across subgraphs. These signals measure the similarity between clients and a neutral anchor graph to guide personalized parameter aggregation, supplemented by an adaptive margin correction to handle long-tail popularity. Results show an NDCG improvement of up to 24% across five datasets.

Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders

VISTA decouples the target attention of candidates over ultra-long user histories into a two-stage process: first, compressing million-length histories into hundreds of summary tokens to be cached; second, performing lightweight attention only on these cached tokens downstream. This keeps training and inference costs constant and has been deployed on Meta’s recommendation platform serving billions of users.

More Than What Was Chosen: LLM-based Explainable Recommendation Beyond Noisy User Preferences

Items clicked by users are not necessarily truly liked—this paper proposes "Coherent Preference" (CP) to supplement traditional "Revealed Preference" (RP), and designs a conflict-aware DPO variant, C-APO. It amplifies the influence when RP and CP are consistent and suppresses it when they conflict, thereby simultaneously improving recommendation accuracy and the persuasiveness of rationales.

Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies

Addressing the issue where "completely deterministic" logging policies in industrial ranking systems cause severe bias in traditional IPS-based estimators, this paper proposes the CIPS estimator (and its doubly robust extension CDR). By replacing the "policy probability ratio" with the "user click probability ratio" as the importance weight, it relaxes the support condition required for unbiasedness from "stochastic logging policies" to "intrinsic randomness in click behavior," achieving low-bias or even unbiased evaluation under deterministic logging.

On the Mechanisms of Collaborative Learning in VAE Recommenders

This paper theoretically reveals that whether users can "help each other" in VAE Collaborative Filtering (CF) is determined by their distance in the latent space (a derivable "sharing radius"). It points out that clean inputs only utilize local collaboration, while \(\beta\)-KL and input masking promote global collaboration at certain costs. Accordingly, the authors propose Personalized Item Alignment (PIA), a training-only anchor regularization that pulls masked user representations toward the anchor centers of their interacted items. This stabilizes the geometric structure and facilitates semantically aligned global collaboration, achieving improvements across three public datasets and online A/B tests on the Amazon streaming platform.

ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation

This work proposes ProPerSim, a simulation framework that constructs 32 user personas based on the Big Five personality traits within the Smallville household environment. AI assistants perform proactive recommendation decisions every 2.5 minutes. Through DPO preference learning over a 14-day simulation, user satisfaction improved from 2.2/4 to 3.3/4, validating for the first time the feasibility of unifying proactivity and personalization.

Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning

This paper proposes the ConvRec-R1 two-stage framework to train LLM-based conversational recommender systems: first, a Remap–Reflect–Adjust distillation pipeline is used to generate high-quality demonstrations from a black-box teacher that are "grounded within the target catalog" for SFT warmup; then, Rank-GRPO (recrafting GRPO to treat "each rank" in the recommendation list as an action unit) is applied for RL alignment. This allows small models (0.5B–3B) to converge faster on REDDIT-V2 in terms of Recall/NDCG and match or even exceed GPT-4o.

Reinforced Latent Reasoning for LLM-based Recommendation

Addressing the pain points where explicit Chain-of-Thought (CoT) in LLM recommendations is both difficult to obtain as supervisory data and slow during inference, this paper proposes LatentR3. By adding a LatentRATT attention layer at the top of the LLM to compress reasoning into a continuous latent space (requiring only 1 latent token) and employing a modified GRPO (utilizing PPL continuous rewards + batch-level advantage), the model learns latent reasoning end-to-end without any CoT supervision. This approach yields relative improvements of 17.0% and 8.4% when applied to BIGRec and D3, respectively.

RPM: Reasoning-Level Personalization for Black-Box Large Language Models

RPM upgrades black-box LLM personalization from "aligning final responses" to "aligning underlying reasoning processes." It automatically extracts a structured user model of "features → factors → statistics" from raw user history, constructs personalized reasoning paths for each history entry, and feeds these reasoning examples to the model via feature-based retrieval. This enables the LLM to reason following the user's private logic, consistently outperforming existing response-level personalization methods across four task categories with enhanced interpretability.

Search Arena: Analyzing Search-Augmented LLMs

The authors construct Search Arena—the first large-scale search-augmented LLM human preference dataset (24,069 conversations + 12,652 preference votes across 71 languages). The study discovers that user preferences are heavily influenced by citation count (even when citations do not support statements), community-driven platforms are preferred over Wikipedia, and search augmentation does not degrade general chat performance, whereas general LLMs significantly deteriorate in search scenarios.

Steering Diffusion Models Towards Credible Content Recommendation

Addressing the issue of diffusion models recommending untrustworthy content like fake news or misinformation, this paper proposes Disco: a "decoupled diffusion model" that separates user preference signals from untrustworthy signals. It suppresses untrustworthy content by projecting the diffusion target into the null space of untrustworthy features and progressively detects potential untrustworthy items to complete this null space under label scarcity, achieving higher recommendation accuracy and credibility across three real-world datasets.

Supporting High-Stakes Decision Making Through Interactive Preference Elicitation in the Latent Space

This paper addresses high-stakes, low-frequency, and sparse-feedback decision-making scenarios such as apartment hunting. It combines LLM preference priors obtained from user interviews, Autoencoder latent space compression, and Preferential Bayesian Optimization (PBO). By learning user utility functions with fewer pairwise comparisons, it achieves higher ranking accuracy on real housing data compared to vanilla PBO.

Token-Efficient Item Representation via Images for LLM Recommender Systems

The authors propose I-LLMRec, which utilizes item images instead of lengthy text descriptions to represent item semantics in recommendation systems. Through the RISA alignment module and RERI retrieval module, the framework represents an item with only a single token while preserving rich semantics. It achieves an approximate 2.93x inference speedup and outperforms text-description-based methods in recommendation performance.

Token-Efficient Long-Term Interest Sketching and Internalized Reasoning for LLM-based Recommendation

This paper proposes SIREN, which uses "long-term interest sketches" to compress hundreds of user histories into a short sequence of "liked/disliked semantic topics" for LLMs. It employs a "two-stage training" process: first, learning explicit CoT reasoning via RL, and second, internalizing this reasoning into model parameters through hidden state alignment. This maintains CoT-level accuracy under answer-only decoding, reducing input tokens by 48.7% and inference latency by over 100× compared to CoT.