Skip to content

📚 Pretraining

🤖 AAAI2026 · 9 paper notes

📌 Same area in other venues: 📷 CVPR2026 (5) · 🔬 ICLR2026 (79) · 💬 ACL2026 (12) · 🧪 ICML2026 (27) · 🧠 NeurIPS2025 (51) · 📹 ICCV2025 (9)

Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment

This paper proposes MA-CLIP, which discovers and exploits the magnitude information of CLIP image features as a complementary perceptual quality cue. Combined with cosine similarity, it achieves training-free adaptive dual-cue fusion for image quality assessment.

ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences

ELSPR models pairwise preferences of LLM evaluators as tournament graphs, identifies non-transitive preferences via strongly connected components (SCCs), proposes a normalized directed graph structural entropy metric, and filters problematic training data through graph reconstruction — resulting in a 13.8% reduction in non-transitivity and a 0.088 decrease in structural entropy, while the discarded data achieves only 34.4% human agreement (vs. 52.6% for retained data).

GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval

This paper proposes GranAlign, a training-free granularity-aware alignment framework that addresses the core challenge of semantic granularity mismatch in zero-shot video moment retrieval (ZVMR). By rewriting queries into simplified and detailed variants and matching them against query-agnostic and query-aware video descriptions respectively, GranAlign achieves a 3.23% improvement in mAP@avg on QVHighlights.

Learning Procedural-aware Video Representations through State-Grounded Hierarchy Unfolding

This paper proposes a Task-Step-State (TSS) three-level semantic framework that introduces "state" as a visual grounding layer within the conventional task-step hierarchy, and designs a progressive pretraining strategy following a U-shaped path (Task→Step→State→Step→Task) to unfold the TSS hierarchy stage by stage. The approach achieves comprehensive state-of-the-art performance on task recognition, step recognition, and step forecasting tasks on the COIN and CrossTask datasets.

No-Regret Strategy Solving in Imperfect-Information Games via Pre-Trained Embedding

This paper proposes the Embedding CFR algorithm, which maps information sets in imperfect-information games to a continuous low-dimensional embedding space (rather than discrete clusters), achieving faster exploitability convergence and higher-quality strategy solving under the same space budget.

Perspective from a Broader Context: Can Room Style Knowledge Help Visual Floorplan Localization?

This paper proposes leveraging room style knowledge — obtained via unsupervised clustering pretraining in the form of a room discriminator — to resolve ambiguities caused by repetitive structures in visual floorplan localization (FLoc), achieving state-of-the-art performance on two standard benchmarks: Gibson and Structured3D.

PrefixGPT: Prefix Adder Optimization by a Generative Pre-trained Transformer

PrefixGPT frames prefix adder optimization as a sequence generation problem. A customized GPT model is pretrained to learn design rules, then fine-tuned via RL to generate optimized designs, achieving state-of-the-art area-delay product (ADP) with robustness to initialization.

Rectified Noise: A Generative Model Using Positive-incentive Noise

This paper proposes Rectified Noise (ΔRN), which leverages the positive-incentive noise (π-noise) framework to learn a set of beneficial noise signals and inject them into the velocity field of a pretrained Rectified Flow model, achieving a reduction in FID from 10.16 to 9.05 on ImageNet-1k with only 0.39% additional parameters.

TRACE: A Generalizable Drift Detector for Streaming Data-Driven Optimization

This paper proposes TRACE, a transferable concept drift detector based on attention-based sequence learning. By tokenizing statistical features and employing a dual-attention encoder, TRACE learns drift patterns that generalize across tasks, enabling deployment on unseen datasets and integration as a plug-and-play module into streaming data-driven optimization algorithms.