📚 Pretraining¶
🤖 AAAI2026 · 9 paper notes
📌 Same area in other venues: 📷 CVPR2026 (5) · 🔬 ICLR2026 (79) · 💬 ACL2026 (12) · 🧪 ICML2026 (27) · 🧠 NeurIPS2025 (51) · 📹 ICCV2025 (9)
- Beyond Cosine Similarity: Magnitude-Aware CLIP for No-Reference Image Quality Assessment
-
This paper proposes MA-CLIP, which discovers and exploits the magnitude information of CLIP image features as a complementary perceptual quality cue. Combined with cosine similarity, it achieves training-free adaptive dual-cue fusion for image quality assessment.
- ELSPR: Evaluator LLM Training Data Self-Purification on Non-Transitive Preferences
-
ELSPR models pairwise preferences of LLM evaluators as tournament graphs, identifies non-transitive preferences via strongly connected components (SCCs), proposes a normalized directed graph structural entropy metric, and filters problematic training data through graph reconstruction — resulting in a 13.8% reduction in non-transitivity and a 0.088 decrease in structural entropy, while the discarded data achieves only 34.4% human agreement (vs. 52.6% for retained data).
- GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval
-
This paper proposes GranAlign, a training-free granularity-aware alignment framework that addresses the core challenge of semantic granularity mismatch in zero-shot video moment retrieval (ZVMR). By rewriting queries into simplified and detailed variants and matching them against query-agnostic and query-aware video descriptions respectively, GranAlign achieves a 3.23% improvement in mAP@avg on QVHighlights.
- Learning Procedural-aware Video Representations through State-Grounded Hierarchy Unfolding
-
This paper proposes a Task-Step-State (TSS) three-level semantic framework that introduces "state" as a visual grounding layer within the conventional task-step hierarchy, and designs a progressive pretraining strategy following a U-shaped path (Task→Step→State→Step→Task) to unfold the TSS hierarchy stage by stage. The approach achieves comprehensive state-of-the-art performance on task recognition, step recognition, and step forecasting tasks on the COIN and CrossTask datasets.
- No-Regret Strategy Solving in Imperfect-Information Games via Pre-Trained Embedding
-
This paper proposes the Embedding CFR algorithm, which maps information sets in imperfect-information games to a continuous low-dimensional embedding space (rather than discrete clusters), achieving faster exploitability convergence and higher-quality strategy solving under the same space budget.
- Perspective from a Broader Context: Can Room Style Knowledge Help Visual Floorplan Localization?
-
This paper proposes leveraging room style knowledge — obtained via unsupervised clustering pretraining in the form of a room discriminator — to resolve ambiguities caused by repetitive structures in visual floorplan localization (FLoc), achieving state-of-the-art performance on two standard benchmarks: Gibson and Structured3D.
- PrefixGPT: Prefix Adder Optimization by a Generative Pre-trained Transformer
-
PrefixGPT frames prefix adder optimization as a sequence generation problem. A customized GPT model is pretrained to learn design rules, then fine-tuned via RL to generate optimized designs, achieving state-of-the-art area-delay product (ADP) with robustness to initialization.
- Rectified Noise: A Generative Model Using Positive-incentive Noise
-
This paper proposes Rectified Noise (ΔRN), which leverages the positive-incentive noise (π-noise) framework to learn a set of beneficial noise signals and inject them into the velocity field of a pretrained Rectified Flow model, achieving a reduction in FID from 10.16 to 9.05 on ImageNet-1k with only 0.39% additional parameters.
- TRACE: A Generalizable Drift Detector for Streaming Data-Driven Optimization
-
This paper proposes TRACE, a transferable concept drift detector based on attention-based sequence learning. By tokenizing statistical features and employing a dual-attention encoder, TRACE learns drift patterns that generalize across tasks, enabling deployment on unseen datasets and integration as a plug-and-play module into streaming data-driven optimization algorithms.