AAAI 2026 Recommender Systems Sequential Recommendation Frequency-Domain Analysis Dual-Path Network DFT Cross-Session Modeling Frequency-Domain Loss

FreqRec: Exploiting Inter-Session Information with Frequency-enhanced Dual-Path Networks for Sequential Recommendation¶

Conference: AAAI 2026 arXiv: 2511.06285 Code: https://github.com/AONE-NLP/FreqRec Area: Recommender Systems / Sequential Recommendation Keywords: Sequential Recommendation, Frequency-Domain Analysis, Dual-Path Network, DFT, Cross-Session Modeling, Frequency-Domain Loss

TL;DR¶

This paper proposes FreqRec, a dual-path architecture that applies frequency-domain transformations along the batch axis and the time axis to capture group-level consumption rhythms across sessions and fine-grained individual user interests, respectively. A frequency-domain consistency loss is introduced to explicitly align predicted and ground-truth frequency spectra. FreqRec achieves up to 7.38% improvement in NDCG@10 over the strongest baseline on three Amazon datasets.

Background & Motivation¶

Background: Sequential recommendation (SR) aims to predict the next item of interest from a user's historical interaction sequence. Transformer-based methods (e.g., SASRec, BERT4Rec) have become dominant; however, the global receptive field of self-attention inherently acts as a low-pass filter, smoothing out abrupt behavioral changes and periodic consumption patterns. To address this, methods such as FMLPRec, FEARec, and BSARec incorporate frequency-domain modules to recover high-frequency signals.

Limitations of Prior Work: - Inter-session dependencies are ignored: Existing frequency-domain methods (FMLPRec, BSARec) process each session independently, neglecting spectral correlations across sessions. Individual sessions are naturally short, and isolated processing exacerbates data sparsity. The paper validates the importance of cross-session patterns by computing Pearson correlation coefficients between sessions sharing common items, revealing statistically significant positive correlations. - Time-domain objectives cannot exploit frequency-domain information: Mainstream SR models are optimized solely with time-domain losses such as cross-entropy or BPR, imposing no explicit constraint on the discrepancy between predicted and ground-truth spectra, thereby wasting periodic and high-frequency behavioral signals.

Key Challenge: Frequency-domain modules can recover high-frequency signals, but per-session processing loses group-level patterns; time-domain losses drive classification accuracy but do not encourage spectral feature learning, creating a gap between the two.

Goal: Simultaneously model cross-session (group-level) and intra-session (user-level) frequency-domain dependencies, and bridge the prediction–ground-truth spectral alignment gap via a frequency-domain loss.

Key Insight: Apply DFT along the batch axis and the time axis respectively to construct a dual-path frequency-enhanced network, coupled with a learnable complex-valued FreqMLP and a frequency-domain consistency loss.

Core Idea: Batch-axis DFT extracts group-shared consumption rhythms; time-axis DFT captures user-specific spectral patterns; a frequency-domain loss aligns predicted and ground-truth spectral coefficients.

Method¶

Overall Architecture¶

FreqRec consists of two parallel paths:

Self-Attention Branch: Encodes long-range contextual dependencies and produces the contextual representation \(\mathbf{X_{SA}}\).
FreqNet Branch: Comprises a Global Spectral Aggregator (GSA) and a Local Spectral Refiner (LSR), which can be combined in parallel or serial fashion to produce the frequency-enhanced representation \(\mathbf{X_F}\).

The two paths are integrated via a gated residual update: \(\mathbf{X_{out}} = (1-\alpha) \cdot \mathbf{X_{SA}} + \alpha \cdot \mathbf{X_F}\).

Key Designs¶

1. Frequency-Domain MLP (FreqMLP)

Function: Performs learnable frequency-domain filtering on complex coefficients produced by the DFT.
Mechanism: Decomposes complex numbers into real and imaginary parts, processes them with two sets of learnable weight matrices \(\mathcal{W}_r, \mathcal{W}_i\) in a cross-multiplicative manner, enabling information exchange between the real and imaginary parts without manually specifying low-pass or band-pass cutoff frequencies.
Design Motivation: Traditional frequency-domain methods require manual selection of filter types and cutoff points, whereas FreqMLP learns end-to-end which frequencies to amplify and which to suppress.

2. Global Spectral Aggregator (GSA)

Function: Applies DFT → FreqMLP → IDFT along the batch axis.
Mechanism: Treats all user sequences in a mini-batch as a single signal and applies the Fourier transform along the batch dimension to extract shared consumption rhythms across users.
Design Motivation: When an individual user's history is sparse, group-level patterns provide a strong complementary signal. This assumption is confirmed by statistical analysis of sessions sharing common items.

3. Local Spectral Refiner (LSR)

Function: Applies DFT → FreqMLP → IDFT along the time axis.
Mechanism: Performs frequency-domain analysis on each user's interaction sequence along the temporal dimension to capture user-specific periodic patterns and abrupt interest shifts.
Design Motivation: GSA provides group commonality, while LSR is responsible for recovering individualized fine-grained dynamics.

4. Dual-Path Fusion Strategy

Parallel Fusion: GSA and LSR independently process the original embeddings; their outputs are combined as \((1-\gamma) \cdot \mathbf{X_{Inter}} + \gamma \cdot \mathbf{X_{Intra}}\).
Serial Fusion: The output of GSA is added to the original embeddings before being fed into LSR.
Experiments show that Parallel fusion outperforms Serial fusion, as in the serial variant group-level features tend to overwrite the original sequence signal, creating an information bottleneck.

Loss & Training¶

The hybrid loss function is defined as: \(\mathcal{L_{SR}} = (1-\beta) \cdot \mathcal{L_F} + \beta \cdot \mathcal{L_{CE}}\)

Cross-Entropy Loss \(\mathcal{L_{CE}}\): Standard classification objective that treats next-item prediction as a classification task over the item set.
Frequency-Domain Consistency Loss \(\mathcal{L_F}\): Applies DFT separately to the prediction \(P\) and target \(T\), then computes distances over the real and imaginary parts (L1/L2/mixed), explicitly constraining the predicted spectral coefficients to align with the ground-truth spectrum.
The distance function is selected from \(\mathcal{L}_{\mathrm{L1}}\), \(\mathcal{L}_{\mathrm{L2}}\), and \(\mathcal{L}_{\mathrm{mix}}\) via grid search.

Key Experimental Results¶

Main Results¶

Evaluated on three Amazon datasets (Beauty, Sports & Outdoors, Toys & Games) against 14 baselines:

Dataset	Metric	FreqRec(P)	BSARec (strongest baseline)	Gain
Beauty	HR@10	0.0989	0.0944	+4.77%
Beauty	NDCG@10	0.0601	0.0574	+4.70%
Sports	HR@20	0.0859	0.0830	+3.49%
Sports	NDCG@20	0.0401	0.0387	+3.62%
Toys	HR@20	0.1468	0.1379	+6.45%
Toys	NDCG@10	0.0653	0.0610	+7.38%

Ablation Study¶

Contribution of each module on the Beauty and Toys datasets:

Variant	Beauty H@10	Beauty N@10	Toys H@20	Toys N@10
FreqRec (full)	0.0989	0.0601	0.1468	0.0653
w/o SA	0.0959	0.0587	0.1338	0.0644
w/o GSA	0.0881	0.0537	0.1295	0.0568
w/o LSR	0.0888	0.0533	0.1268	0.0606
w/o GSA+LSR	0.0787	0.0481	0.0956	0.0436
w/o Freq. Loss	0.0969	0.0582	0.1342	0.0619
w/o CE Loss	0.0807	0.0477	0.0975	0.0434

Key Findings¶

Frequency-domain methods consistently outperform pure Transformer methods: BSARec achieves HR@10 = 0.0944 on Beauty, substantially higher than MSSR's 0.0897.
Removing both GSA and LSR causes more than 20% performance degradation, confirming that the dual-path frequency-domain modules are complementary to self-attention.
GSA is slightly more important than LSR: The performance drop from removing GSA is marginally larger than that from removing LSR, indicating that group-level information is especially critical for sparse users.
Frequency-domain loss as a plug-and-play component: Integrating \(\mathcal{L_F}\) into baselines such as SASRec, FMLPRec, and BSARec yields average improvements of 3.7%–19.5%. FMLPRec's HR@10 improves by 15.21% and BSARec's NDCG@10 improves by 46.45%.
Robustness to sparse sequences: For users with only 5–6 interactions, FreqRec achieves significantly higher HR@5 and NDCG@5 than BSARec.
Robustness to noise: Under cross-domain noisy training (mixed training on Automotive, CDs, and Grocery with per-domain evaluation), FreqRec outperforms BSARec and FMLPRec on all three target domains.

Highlights & Insights¶

Batch-axis DFT is the most significant innovation: Prior frequency-domain methods apply DFT only along the time axis; this paper is the first to apply DFT along the batch axis to model group-level spectral patterns across sessions and users — an approach that is both elegant and effective.
Generalizability of the frequency-domain loss: \(\mathcal{L_F}\) serves as a plug-and-play component that yields substantial gains across multiple baselines, with the largest benefits observed in models that already incorporate frequency-domain modules, demonstrating that time-domain and frequency-domain losses provide complementary supervision signals.
Learnable complex-valued MLP replaces hand-crafted filters: End-to-end learning of the frequency response eliminates the need for prior knowledge in filter design.

Limitations & Future Work¶

All three evaluation datasets are drawn from Amazon e-commerce reviews, limiting domain diversity; validation in music, video, or news recommendation scenarios is absent.
Batch-axis DFT depends on mini-batch composition; batch size and sampling strategies may affect the stability of GSA.
Item-side information (e.g., text, image, and other multimodal features) is not considered and represents a potential direction for further enhancement.
The choice of distance function for the frequency-domain loss (L1/L2/mixed) requires grid search, lacking an adaptive mechanism.
No direct comparison with graph neural network-based cross-session methods (e.g., SR-GNN variants) is provided.

FMLPRec / BSARec / FEARec: Pioneering frequency-domain SR works that apply DFT along the time axis; this paper extends the paradigm to the batch axis.
FNet: A work in NLP that replaces self-attention with Fourier transforms, providing inspiration for incorporating signal processing tools into deep learning.
Frequency-domain losses in other domains: The idea parallels frequency-aware losses in image generation (e.g., frequency-domain GAN losses), offering potential for cross-domain knowledge transfer.
Broader Implications: Frequency-domain analysis, as a sequence modeling approach orthogonal to attention mechanisms, has the potential to generalize to time-series forecasting, event sequence modeling, and other domains.

Rating¶

Novelty: ⭐⭐⭐⭐ Batch-axis DFT for cross-session modeling is an interesting and novel perspective.
Experimental Thoroughness: ⭐⭐⭐⭐ Three main datasets + three noisy cross-domain settings + detailed ablations + sparsity analysis + plug-and-play validation.
Writing Quality: ⭐⭐⭐⭐ Clear motivation, complete mathematical derivations.
Value: ⭐⭐⭐⭐ The frequency-domain loss is plug-and-play and directly applicable to existing SR models.