🔄 Self-Supervised Learning¶
🤖 AAAI2026 · 16 paper notes
📌 Same area in other venues: 📷 CVPR2026 (89) · 🔬 ICLR2026 (81) · 💬 ACL2026 (1) · 🧪 ICML2026 (28) · 🧠 NeurIPS2025 (33) · 📹 ICCV2025 (13)
🔥 Top topics: Adversarial Robustness ×5 · Continual Learning ×3 · Alignment/RLHF ×3
- BCE3S: Binary Cross-Entropy Based Tripartite Synergistic Learning for Long-tailed Recognition
-
BCE3S is proposed, a binary cross-entropy (BCE)-based tripartite synergistic learning framework that integrates BCE-based joint learning, BCE-based contrastive learning, and BCE-based classifier uniformity learning. By decoupling per-class logits via Sigmoid, it suppresses the imbalance effects inherent to long-tailed distributions, achieving state-of-the-art performance on CIFAR10/100-LT, ImageNet-LT, and iNaturalist2018.
- CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds
-
This paper proposes CATFormer, a data-replay-free continual learning framework built upon a spiking Vision Transformer, which achieves task-specific neuronal excitability modulation via context-adaptive dynamic firing thresholds. Over sequences of up to 100 tasks, the model not only avoids forgetting but actually improves in accuracy — a phenomenon the authors term "reverse forgetting."
- Expandable and Differentiable Dual Memories with Orthogonal Regularization for Exemplar-free Continual Learning
-
This paper proposes EDD (Expandable and Differentiable Dual Memory), an exemplar-free continual learning method that decomposes data into reusable sub-features via differentiable shared and task-specific memories, combined with memory expansion-pruning and orthogonal regularization mechanisms. EDD surpasses 14 state-of-the-art methods on CIFAR-10/100 and Tiny-ImageNet, achieving final accuracies of 55.13%, 37.24%, and 30.11%, respectively.
- Explanation-Preserving Augmentation for Semi-Supervised Graph Representation Learning
-
This paper proposes EPA-GRL (Explanation-Preserving Augmentation for Graph Representation Learning), which employs a GNN explainer trained with a small number of labels to identify semantic subgraphs (explanation subgraphs). During augmentation, only the non-semantic portions (marginal subgraphs) are perturbed, achieving semantics-preserving graph augmentation. EPA-GRL significantly outperforms semantics-agnostic random augmentation methods across 6 benchmarks.
- FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Clients
-
This paper proposes FedGRPO, which reformulates foundation model optimization as a reward-based evaluation process. Through competence-aware expert selection and federated group-relative policy optimization (transmitting only scalar reward signals), FedGRPO achieves privacy-preserving, communication-efficient federated foundation model optimization, approaching or surpassing centralized GRPO on mathematical reasoning and question-answering tasks.
- FineXtrol: Controllable Motion Generation via Fine-Grained Text
-
This paper proposes FineXtrol, a framework that leverages temporally annotated, fine-grained body-part text descriptions as control signals. By combining a dual-branch ControlNet architecture with hierarchical contrastive learning to enhance the discriminability of the text encoder, FineXtrol achieves efficient, user-friendly, and precise controllable human motion generation, significantly outperforming existing methods on multi-body-part control benchmarks on HumanML3D.
- From Pretrain to Pain: Adversarial Vulnerability of Video Foundation Models without Finetuning
-
This paper proposes Transferable Video Attack (TVA), which generates adversarial perturbations solely by exploiting the embedding space of open-source Video Foundation Models (VFMs), without any knowledge of downstream tasks, and effectively attacks downstream models and multimodal LLMs across 24 video tasks.
- GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery
-
Grounded in Neural Collapse theory, this paper replaces dynamic classifiers with a fixed Equiangular Tight Frame (ETF) classifier and achieves continual generalized category discovery via supervised alignment and confidence-guided unsupervised alignment, reducing forgetting by 16.1% and improving novel category discovery by 3.2% across four benchmarks.
- HiLoMix: Robust High- and Low-Frequency Graph Learning Framework for Mixing Address Association
-
This paper proposes HiLoMix, a robust graph learning framework for the mixing address association task. It addresses three core challenges—graph sparsity, label scarcity, and label noise—through a Heterogeneous Attribute Mixing Interaction Graph (HAMIG), frequency-aware graph contrastive learning, and confidence-based label weighting supervision, respectively. HiLoMix surpasses the second-best baseline by 5.69%, 7.34%, and 15.61% on F1, AUC, and MRR.
- Improving Region Representation Learning from Urban Imagery with Noisy Long-Caption Supervision
-
This paper proposes UrbanLN, a framework that improves urban region representation learning from LLM-generated captions via a long-caption-aware positional encoding interpolation strategy and a dual-level (data and model) noise suppression mechanism.
- Improving Sustainability of Adversarial Examples in Class-Incremental Learning
-
This paper proposes the SAE framework to address the degradation of adversarial examples (AEs) caused by domain drift in class-incremental learning (CIL). Through a semantic correction module (jointly guided by CLIP and the CIL model) and a filtering-and-augmentation module (removing semantically confusing samples), SAE maintains attack effectiveness even after a 9× increase in the number of classes, achieving an average attack success rate improvement of 31.28%.
- Let the Void Be Void: Robust Open-Set Semi-Supervised Learning via Selective Non-Alignment
-
This paper proposes SkipAlign, a framework that introduces a third "skip" operation alongside the conventional pull/push operations in contrastive learning. Low-confidence samples are selectively excluded from alignment and subjected only to mild repulsion, allowing in-distribution (ID) classes to form compact "galaxies" while OOD samples naturally disperse into the "interstellar void." The approach achieves an average AUC improvement of +3.1 on unseen OOD detection, with a maximum gain of +7.1.
- Robust Tabular Foundation Models
-
This paper proposes RTFM — a model-agnostic adversarial training framework that performs min-max optimization over the parameter space of a synthetic data generator, maximizing the "optimality gap" between a tabular foundation model (TFM) and classical tree-based models. Using fewer than 100,000 additional synthetic datasets, RTFM significantly improves TabPFN V2 across multiple tabular benchmarks.
- Self-Supervised Inductive Logic Programming
-
This paper proposes a new self-supervised inductive logic programming (SS-ILP) setting and the Poker system, which starts from a small number of positive labeled examples and unlabeled examples, automatically generates positive and negative examples, and employs a maximally general second-order normal form (SONF) background theory to learn logic programs with recursion and predicate invention in the absence of negative examples.
- Spikingformer: A Key Foundation Model for Spiking Neural Networks
-
This paper proposes Spikingformer, which integrates MS Residual with Self-Attention in a spike-driven manner to address the non-spike computation introduced by SEW Residual in Spikformer, while preserving global modeling capability.
- Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space
-
This paper proposes L-HAKT, a framework that for the first time integrates LLM dual-agent design with hyperbolic geometry for knowledge tracing. A Teacher Agent parses exercise semantics and constructs a hierarchical knowledge graph, while a Student Agent simulates individual learning behaviors to generate synthetic interaction data. Hyperbolic contrastive learning is employed to calibrate the distributional gap between synthetic and real data. L-HAKT achieves an AUC of up to 80.29% across four educational datasets, with an AUC improvement of 13.03% over the GKT baseline on EdNet.