🛡️ AI Safety¶
🧠 NeurIPS2025 · 73 paper notes
📌 Same area in other venues: 📷 CVPR2026 (143) · 🔬 ICLR2026 (140) · 💬 ACL2026 (5) · 🧪 ICML2026 (114) · 🤖 AAAI2026 (45) · 📹 ICCV2025 (24)
🔥 Top topics: Adversarial Robustness ×14 · Federated Learning ×4 · Alignment/RLHF ×3 · Domain Adaptation ×3 · GNNs ×2
- A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers
-
This paper proposes a set of generalized components (Component A/B/C) that establish a bidirectional collaborative relationship between sample selection and trigger design, simultaneously improving the attack success rate (ASR) and stealthiness of Poison-only Clean-label Backdoor Attacks (PCBA), with strong generalizability across multiple attack types.
- Beyond Last-Click: An Optimal Mechanism for Ad Attribution
-
This paper analyzes the strategic manipulation vulnerabilities of the Last-Click attribution mechanism from a game-theoretic perspective—platforms can obtain unfair attribution credit by falsifying timestamps—and proposes the Peer-Validated Mechanism (PVM), in which each platform's credit depends solely on the reports of other platforms (analogous to peer review). The paper theoretically proves that PVM is dominant strategy incentive compatible (DSIC) and optimal under homogeneous settings, improving attribution accuracy from 34% to 75% in the two-platform case.
- Boosting Adversarial Transferability with Spatial Adversarial Alignment
-
This paper proposes Spatial Adversarial Alignment (SAA), which fine-tunes a surrogate model via two modules—spatial-aware alignment and adversarial-aware alignment—to align its features with those of a witness model, achieving significant improvements in cross-architecture adversarial transferability (CNN→ViT transfer rate improved by 25–39%).
- Brain-like Variational Inference
-
This paper proposes the FOND framework (Free energy Online Natural-gradient Dynamics), which derives spiking neural network inference dynamics from first principles via free energy minimization, and implements iPVAE (iterative Poisson VAE). iPVAE outperforms standard VAEs and predictive coding models in reconstruction–sparsity trade-off, biological plausibility, and OOD generalization.
- Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness
-
By embedding rotation-equivariant (P4 group) and scale-equivariant convolutional layers into CNNs, this work proposes two symmetry-aware architectures — Parallel and Cascaded — that significantly improve adversarial robustness without adversarial training. Grounded in the CLEVER framework, it theoretically demonstrates that equivariant architectures compress the hypothesis space, regularize gradients, and tighten certified robustness bounds.
- Causally Reliable Concept Bottleneck Models
-
This paper proposes C2BM (Causally reliable Concept Bottleneck Models), which organizes the concept bottleneck as a causal graph structure. By combining observational data with background knowledge, C2BM automatically learns causal relationships, achieving significantly improved causal reliability, intervention responsiveness, and fairness while maintaining classification accuracy.
- Cost Efficient Fairness Audit Under Partial Feedback
-
Under the partial feedback setting, this paper proposes a fairness auditing framework with a novel cost model, delivering near-optimal audit algorithms for both black-box and mixture model scenarios, reducing audit cost by approximately 50% compared to natural baselines.
- CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D
-
This work extends MLE-Bench to construct 20 code-sabotage tasks and sandbagging evaluations. It finds that frontier AI agents can successfully plant backdoors and other sabotage while completing normal ML engineering tasks, and in some cases evade detection by LM monitors.
- Deceptron: Learned Local Inverses for Fast and Stable Physics Inversion
-
This paper proposes the Deceptron bidirectional module, which learns a local inverse of a differentiable forward surrogate and introduces a Jacobian Composition Penalty (JCP). By mapping output-space residuals back to the input space, the method achieves Gauss-Newton-like preconditioned gradient updates for physics inversion, dramatically reducing iteration counts (approximately 20× speedup on Heat-1D).
- DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning
-
This paper proposes DESIGN, a framework that accelerates FHE-based GNN inference by approximately \(2\times\) over the SEAL baseline through two-stage server-side optimization—input graph pruning and adaptive polynomial activation degree allocation—while maintaining competitive accuracy.
- DictPFL: Efficient and Private Federated Learning on Encrypted Gradients
-
This paper proposes DictPFL, a framework that decomposes model weights into a static dictionary and a trainable lookup table, and combines this decomposition with encryption-aware pruning. DictPFL achieves full gradient protection via homomorphic encryption in federated learning while reducing communication overhead by 402–748× and training time by 28–65×, keeping total runtime within 2× of plaintext FL.
- Differential Privacy for Euclidean Jordan Algebra with Applications to Private Symmetric Cone Programming
-
This paper proposes a general Gaussian privacy mechanism based on Euclidean Jordan Algebra (EJA) and, building upon it, designs the first differentially private algorithm for Symmetric Cone Programming (SCP), thereby resolving an important open problem on differentially private semidefinite programming posed by Hsu et al. (ICALP 2014).
- Differentially Private Bilevel Optimization: Efficient Algorithms with Near-Optimal Rates
-
This paper systematically studies bilevel optimization under differential privacy (DP). For the convex setting, it establishes near-tight upper and lower bounds via the exponential mechanism and regularized exponential mechanism, matching the optimal rate of single-level DP-ERM. For the non-convex setting, it proposes a second-order DP method achieving state-of-the-art convergence rates that are independent of the inner-level dimension.
- Differentially Private High-dimensional Variable Selection via Integer Programming
-
This paper proposes two pure differentially private sparse variable selection methods (top-R and mistakes) that leverage modern mixed integer programming (MIP) techniques to efficiently explore non-convex objective landscapes, achieving state-of-the-art support recovery rates in high-dimensional settings (p up to 10,000) while providing theoretical recovery guarantees.
- Distributional Adversarial Attacks and Training in Deep Hedging
-
This paper is the first to introduce distributional adversarial attacks into the deep hedging framework. It proposes computationally tractable adversarial training methods based on Wasserstein balls (WPGD and WBPGD), achieving substantial improvements in robustness and out-of-sample performance under distribution shift and real market data.
- Double Descent Meets Out-of-Distribution Detection: Theoretical Insights and Empirical Analysis
-
This paper is the first to reveal a double descent phenomenon in post-hoc OOD detection—OOD detection performance exhibits a valley near the interpolation threshold as model width increases, then recovers—provides a theoretical explanation via random matrix theory, and proposes an NC1 criterion based on Neural Collapse to identify the optimal model complexity regime.
- Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization
-
This paper proposes the Dual-Flow framework, which leverages the forward ODE flow of a pretrained diffusion model and the reverse flow of a fine-tuned LoRA velocity function to perform multi-target, instance-agnostic adversarial attacks. Through a cascading distribution shift training strategy, the method significantly improves transfer attack success rates (e.g., +34.58% from Inc-v3 to Res-152) and demonstrates strong robustness against defended models.
- Efficient Fairness-Performance Pareto Front Computation
-
This paper proposes MIFPO, a method that efficiently computes the fairness-performance Pareto front without training complex fair representation models, by theoretically reducing the problem to a compact discrete concave optimization problem.
- Efficient Verified Machine Unlearning for Distillation
-
This paper proposes PURGE, a framework that extends verified unlearning under SISA to the knowledge distillation (KD) setting via teacher–student constituent mapping and an incremental multi-teacher distillation strategy. When a teacher-side unlearning request is issued, only a subset of student constituents requires retraining, achieving at least \(N\times\) speedup.
- Enhancing Graph Classification Robustness with Singular Pooling
-
This paper presents the first systematic analysis of how flat pooling operators (Sum/Avg/Max) affect adversarial robustness in graph classification. It derives adversarial risk upper bounds for each operator and proposes RS-Pool—a method that constructs graph-level representations from the dominant singular vector of the node embedding matrix—achieving significant robustness improvements without sacrificing clean accuracy.
- Environment Inference for Learning Generalizable Dynamical System
-
This paper proposes DynaInfer, a framework that infers environment labels for unlabeled trajectories by analyzing the prediction errors of a fixed neural network, enabling generalizable dynamical system learning without environment annotations. DynaInfer matches or surpasses Oracle (known-label) performance on ODE/PDE systems.
- Exploration of Incremental Synthetic Non-Morphed Images for Single Morphing Attack Detection
-
This paper systematically investigates the effect of incrementally introducing synthetic non-morphed face images into Single Morphing Attack Detection (S-MAD) training. Results show that a moderate proportion of synthetic data (~75% increment) can improve cross-dataset generalization (EER reduced from 6.17% to 6.10%), while excessive use or training exclusively on synthetic data leads to severe performance degradation (EER rising to ~38%).
- Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
-
This paper proposes DecoRemoval, a framework that achieves data removal without full retraining via two modules: discriminability-preserving factor decorrelation (RFF-based spatial mapping with adaptive weighting) and smoothed loss perturbation. The method significantly outperforms existing approaches, particularly under out-of-distribution (OOD) settings.
- Fair Minimum Labeling: Efficient Temporal Network Activations for Reachability and Equity
-
This paper introduces the Fair Minimum Labeling (FML) problem, which aims to design minimum-cost temporal edge activation schemes ensuring sufficient temporal-path reachability for each node group in a network to satisfy fair coverage requirements. The paper proves FML is NP-hard and inapproximable beyond a certain factor, and provides an approximation algorithm based on probabilistic tree embeddings that matches the hardness lower bound.
- FairContrast: Enhancing Fairness through Contrastive Learning and Customized Augmentation
-
FairContrast proposes a fair contrastive learning framework for tabular data. By strategically selecting positive pairs—pairing advantaged-group samples with favorable outcomes against their disadvantaged-group counterparts—and training end-to-end with supervised or self-supervised contrastive loss combined with cross-entropy loss, the framework achieves significant bias reduction with minimal accuracy loss, without introducing any additional fairness constraint losses.
- Fairness-Regularized Online Optimization with Switching Costs
-
This paper is the first to rigorously integrate long-term fairness and action smoothness into a unified online optimization framework. It first establishes that the original problem is fundamentally intractable under standard dynamic benchmarks, then proposes FairOBD, which online-izes the fairness cost via auxiliary variables and dual mirror descent, achieving an asymptotically optimal competitive ratio under the more principled \((R, \delta)\)-constrained benchmark.
- Fairness under Competition
-
This paper is the first to study the joint fairness of multiple fair classifiers operating in a competitive environment. It theoretically demonstrates that even when each individual classifier satisfies Equal Opportunity (EO), the ecosystem as a whole may remain unfair, and that applying fairness adjustments to a biased classifier can paradoxically reduce ecosystem-level fairness.
- ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization
-
ForensicHub introduces the first unified benchmark platform spanning all domains (Deepfake/IMDL/AIGC/Document Tampering) for fake image detection and localization, encompassing 4 tasks, 23 datasets, 42 models, 6 backbone networks, and 11 GPU-accelerated evaluation metrics. Through a modular architecture and adapter design, it bridges domain silos and conducts 16 cross-domain evaluations to derive 8 key insights.
- Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing
-
This paper proposes RawMed — the first framework to synthesize multi-table time series EHR data from raw records with minimal lossy preprocessing: events are textualized → compressed into a discrete latent space via Residual Quantization → temporal dynamics are modeled with an autoregressive Transformer. RawMed comprehensively outperforms existing baselines in fidelity, clinical utility, and privacy protection.
- Harnessing Feature Resonance under Arbitrary Target Alignment for Out-of-Distribution Node Detection
-
This paper discovers the Feature Resonance phenomenon—when optimizing the representations of known in-distribution (ID) nodes, unknown ID nodes undergo significantly larger representational changes than OOD nodes, and this phenomenon is label-agnostic. Based on this observation, the authors propose RSL, a graph OOD node detection framework that requires no multi-class labels, achieving state-of-the-art performance across 13 datasets.
- Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning
-
This paper theoretically and empirically demonstrates a power-law relationship between membership inference attack (MIA) vulnerability and the number of samples per class in deep transfer learning: as the per-class sample count \(S\) increases, MIA advantage decays as \(S^{-1/2}\). However, the amount of data required to protect the most vulnerable samples is prohibitively large, highlighting the irreplaceable role of formal differential privacy guarantees.
- Improved Balanced Classification with Theoretically Grounded Loss Functions
-
Two theory-driven surrogate loss families are proposed—Generalized Logit-Adjusted (GLA) loss and Generalized Class-Aware weighted (GCA) loss—providing stronger theoretical guarantees and improved empirical performance for multi-class classification under class imbalance.
- Incentivizing Time-Aware Fairness in Data Sharing
-
This paper proposes a time-aware data sharing framework that introduces new incentive conditions (F6–F8) and two reward schemes—Time-Aware Reward Cumulation and Time-Aware Data Valuation—to ensure that participants who join a collaboration earlier receive higher-value rewards, while simultaneously preserving fairness and individual rationality.
- Influence Functions for Edge Edits in Non-Convex Graph Neural Networks
-
This paper proposes influence functions for edge edits applicable to non-convex GNNs. By leveraging the proximal Bregman response function (PBRF), the method relaxes the convexity assumption and jointly accounts for both parameter shift and message propagation effects, supporting both edge deletion and insertion.
- It's Complicated: The Relationship of Algorithmic Fairness and Non-Discrimination Provisions for High-Risk Systems in the EU AI Act
-
This paper systematically analyzes the complex relationship between the non-discrimination provisions for high-risk AI systems in the EU AI Act (AIA) and the field of algorithmic fairness in machine learning. It reveals critical gaps in the legal text concerning input-side bias detection, the absence of output-side protections, and standardization challenges, providing a foundational framework for interdisciplinary collaboration between computer science and law.
- Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification
-
This paper systematically evaluates compression-based adversarial purification defenses and demonstrates that the realism of reconstructed images is the critical factor for robustness—high-realism compression models maintain significant robustness under strong adaptive attacks, and this robustness is not attributable to gradient masking.
- Learning-Augmented Facility Location Mechanisms for Envy Ratio
-
For the envy ratio objective in one-dimensional facility location, this paper designs both deterministic and randomized learning-augmented mechanisms: the deterministic \(\alpha\)-BIM achieves an optimal consistency–robustness tradeoff, while the randomized BAM further improves the guarantees. The paper also resolves an open problem posed by Ding et al., improving the approximation ratio of prediction-free randomized mechanisms from 2 to approximately 1.8944.
- Locally Optimal Private Sampling: Beyond the Global Minimax
-
Under local differential privacy (LDP), this paper proposes a local minimax framework that leverages neighborhood constraints defined by a public distribution \(P_0\) to derive closed-form optimal samplers, achieving consistent and significant improvements over the global minimax sampler both theoretically and empirically.
- Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
-
This paper systematically identifies five fundamental mismatches between machine unlearning techniques and policy objectives in the context of generative AI, arguing that machine unlearning cannot serve as a universal solution for privacy, copyright, or safety concerns, and provides a practical conceptual framework for both ML researchers and policymakers.
- MARS: A Malignity-Aware Backdoor Defense in Federated Learning
-
This paper proposes MARS, a defense method that quantifies the malignity of local models by computing per-neuron Backdoor Energy (BE), and leverages Wasserstein distance-based clustering to effectively identify backdoor models in federated learning.
- Matchings Under Biased and Correlated Evaluations
-
This paper introduces a correlation parameter \(\gamma\) (the degree of alignment between institutional evaluations) into a two-institution stable matching model, and analyzes how bias \(\beta\) and correlation \(\gamma\) jointly affect the representation ratio of disadvantaged groups. It proves that even a slight loss of correlation can cause a sharp drop in representation, and characterizes the Pareto frontier of fairness interventions.
- Mitigating Disparate Impact of Differentially Private Learning through Bounded Adaptive Clipping
-
By introducing a tunable lower bound into adaptive gradient clipping (bounded adaptive clipping), this work prevents the clipping bound from shrinking excessively during training, thereby improving accuracy for minority groups and mitigating algorithmic unfairness under DP constraints.
- Mitigating Privacy-Utility Trade-off in Decentralized Federated Learning via f-Differential Privacy
-
This paper proposes two privacy accounting methods for decentralized federated learning under the f-DP framework—PN-f-DP and Sec-f-LDP—which leverage hypothesis-testing-based privacy measures to consistently yield tighter privacy bounds than Rényi DP, thereby reducing noise injection and improving model utility under equivalent privacy guarantees.
- Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning
-
This paper proposes Per-layer Model Inversion (PMI) for data-free continual learning to accelerate synthetic image generation, and mitigates the feature drift between synthetic and real data via class-level Gaussian feature modeling and contrastive learning, achieving efficient and high-quality data-free knowledge replay.
- Multi-Class Support Vector Machine with Differential Privacy
-
This paper proposes the PMSVM framework, which exploits the single-pass data access property of all-in-one multi-class SVMs. By combining weight perturbation and gradient perturbation, PMSVM substantially reduces the privacy budget consumption of multi-class SVMs under differential privacy, achieving a superior privacy–utility trade-off.
- Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor
-
This paper presents the first hypothesis selection algorithm under the central differential privacy model that simultaneously achieves nearly-linear time complexity and the optimal approximation factor \(\alpha=3\), resolving an open problem posed by Bun et al. (NeurIPS 2019).
- Not All Deepfakes Are Created Equal: Triaging Audio Forgeries for Robust Deepfake Singer Identification
-
This paper proposes a two-stage pipeline grounded in the premise that the most harmful deepfakes are those of the highest quality. A discriminator first filters out low-quality forgeries to reduce noise; a singer identification model trained exclusively on genuine recordings then performs voiceprint matching. The pipeline consistently outperforms baselines across multiple datasets.
- OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction
-
OmniFC is proposed as a model-agnostic federated clustering framework that exactly reconstructs the global pairwise distance matrix over a finite field via Lagrange coded computing, enabling any centralized clustering method (K-Means / Spectral Clustering / DBSCAN / Hierarchical Clustering, etc.) to run directly on the reconstructed matrix. The framework requires only a single communication round, is inherently robust to Non-IID data, and comprehensively outperforms specialized methods such as k-FED, MUFC, and FedSC across 7 datasets.
- On the Hardness of Conditional Independence Testing In Practice
-
This paper systematically analyzes the root causes of failure in kernel-based conditional independence (CI) testing in practice: estimation error in conditional mean embeddings is identified as the central driver of Type-I error inflation, while the inherent tension between the choice of conditioning kernel \(k_C\)—which is critical for test power—and its tendency to exacerbate false positives is formally characterized.
- Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring
-
This paper introduces Open-Insect — the first large-scale fine-grained open-set recognition benchmark for insect species discovery, spanning three geographic regions and three types of open-set splits. It systematically evaluates 38 OSR algorithms, finding that simple posterior methods (e.g., MSP) remain strong baselines in fine-grained settings, and demonstrates the critical role of domain-relevant auxiliary data in improving OSR performance.
- Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect
-
This paper establishes three foundational theoretical results for the weighted controlled direct effect (WCDE): necessary and sufficient conditions for unique identifiability, derivation of the influence function for nonparametric estimation, and characterization of the optimal covariate adjustment set that minimizes asymptotic variance.
- Position: Bridge the Gaps between Machine Unlearning and AI Regulation
-
This paper systematically analyzes six potential application scenarios of Machine Unlearning (MU) in compliance with the EU AI Act (AIA), identifies the technical gaps between the state of the art and actual regulatory requirements in each scenario, and calls on the research community to bridge these gaps in order to realize the potential of MU in AI governance.
- Preserving Task-Relevant Information Under Linear Concept Removal
-
SPLINCE constructs an oblique projection that simultaneously guarantees linear guardedness (i.e., sensitive attributes cannot be predicted by any linear classifier) and exactly preserves the covariance between representations and task labels, thereby resolving the problem of existing concept erasure methods inadvertently removing task-relevant information alongside sensitive concepts.
- Private Continual Counting of Unbounded Streams
-
This paper proposes a novel matrix factorization method based on logarithmic perturbation, achieving for the first time a differentially private continual counting algorithm that simultaneously satisfies the three properties of "unbounded streams," "smooth error," and "near-optimal asymptotic error," with variance \(O(\log^{2+2\alpha}(t))\) at time step \(t\) for any \(\alpha > 0\).
- Provable Watermarking for Data Poisoning Attacks
-
This paper proposes two provable watermarking schemes—post-poisoning watermarking and poisoning-concurrent watermarking—that provide transparency declaration mechanisms for data poisoning attacks. The theoretical analysis demonstrates that, under specific watermark length conditions, both watermark detectability and poisoning effectiveness can be simultaneously guaranteed.
- PubSub-VFL: Towards Efficient Two-Party Split Learning in Heterogeneous Environments via Publisher/Subscriber Architecture
-
This paper proposes PubSub-VFL, an efficient two-party vertical federated learning framework based on a publisher/subscriber architecture. Through a hierarchical asynchronous mechanism and system-profiling-based hyperparameter optimization, it achieves 2–7× training speedup and up to 91% computational resource utilization while preserving privacy and model accuracy.
- Reconstruction and Secrecy under Approximate Distance Queries
-
Under the approximate distance query model, this paper studies the reconstruction game from a learning-theoretic perspective, proves a geometric characterization of the optimal reconstruction error as the Chebyshev radius, and provides a complete classification of pseudo-finiteness for Euclidean convex spaces.
- Redundancy-Aware Test-Time Graph Out-of-Distribution Detection
-
This paper proposes RedOUT, a framework that constructs coding trees via structural entropy minimization to eliminate redundant information in graph structures. Combined with the Redundancy-aware Graph Information Bottleneck (ReGIB) principle, RedOUT effectively distinguishes in-distribution (ID) from out-of-distribution (OOD) graph samples at test time without modifying pretrained model parameters, achieving an average AUC of 87.46% across 10 dataset pairs.
- Revisiting Logit Distributions for Reliable Out-of-Distribution Detection
-
This paper proposes LogitGap, a novel post-hoc OOD detection scoring function that explicitly exploits the "gap" between the maximum logit and the remaining logits to distinguish in-distribution (ID) from out-of-distribution (OOD) samples. A top-N selection strategy is introduced to filter noisy logits. Theoretical analysis and experiments demonstrate that LogitGap outperforms MCM and MaxLogit across multiple scenarios.
- Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
-
This paper proposes R2D (Rewind-to-Delete), the first first-order, black-box certified machine unlearning algorithm for general nonconvex loss functions. It achieves data deletion by rewinding to an earlier checkpoint in the training trajectory and then performing gradient descent on the retained data, while providing \((ε, δ)\)-certified unlearning guarantees and theoretical trade-offs among privacy, utility, and efficiency.
- Robust Graph Condensation via Classification Complexity Mitigation
-
This paper reveals that graph condensation (GC) is fundamentally a process of reducing classification complexity, and that adversarial attacks precisely undermine this property. Based on this insight, the authors propose the MRGC framework, which enhances GC robustness through three manifold-based regularization modules: intrinsic dimensionality regularization, curvature-aware manifold smoothing, and inter-class manifold decoupling. This work represents the first systematic study of GC robustness under simultaneous perturbations of structure, features, and labels.
- Sequentially Auditing Differential Privacy
-
This paper proposes a differential privacy auditing framework based on sequential hypothesis testing and kernel MMD statistics, enabling valid detection of privacy violations at any point during streaming mechanism outputs. The approach reduces the required sample count from 50K (as needed by existing methods) to just a few hundred, and can identify DP-SGD privacy violations within less than one full training run.
- Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy
-
This paper establishes novel high-probability perturbation bounds for low-rank approximation of symmetric matrices under the spectral norm, improving upon the classical Eckart–Young–Mirsky theorem, and resolves an open problem in differentially private PCA.
- SPROD: Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection
-
SPROD is a post-hoc OOD detection method designed to handle spurious correlations in training data. It subdivides each class prototype into "correctly classified" and "misclassified" subgroups (the latter sharing spurious features), combined with K-means-style refinement and distance-based (generative) scoring. Across 5 spurious-correlation OOD benchmarks, it achieves an average AUROC of 85.1% (+4.8% vs. runner-up KNN) and FPR@95 of 49.0% (−9.3% vs. runner-up).
- Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization
-
This paper proposes a novel search direction and proves that first-order and zeroth-order online bilevel optimization algorithms built upon it achieve sublinear stochastic bilevel regret guarantees without requiring window smoothing, while improving efficiency through reduced oracle dependence, parallel updates, and zeroth-order Hessian/Jacobian estimation.
- Taught Well, Learned Ill: Towards Distillation-Conditional Backdoor Attack
-
This paper proposes the Distillation-Conditional Backdoor Attack (DCBA) paradigm and its instantiation SCAR, which embeds a "dormant" backdoor into a teacher model via bi-level optimization. The backdoor remains undetectable on the teacher model but is activated and transferred to the student model during knowledge distillation, even when the distillation dataset is entirely clean.
- The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples
-
This paper identifies a critical security vulnerability in machine unlearning: even when an unlearned model is statistically indistinguishable from a retrained model, applying small adversarial perturbations to forgotten samples causes the unlearned model to correctly classify them while the retrained model fails — revealing a novel privacy risk termed "residual knowledge." The authors propose RURK, a fine-tuning strategy that penalizes correct predictions on perturbed forgotten samples, effectively suppressing residual knowledge across 11 unlearning methods on CIFAR-10 and ImageNet-100.
- Towards Unsupervised Open-Set Graph Domain Adaptation via Dual Reprogramming
-
This paper proposes GraphRTA, a framework that addresses the challenges of known-class classification and unknown-class detection in unsupervised open-set graph domain adaptation through two complementary mechanisms: model reprogramming (gradient-guided weight pruning) and graph reprogramming (target graph structure and feature optimization), without requiring manually specified thresholds.
- Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits
-
This paper theoretically establishes that the adversarial robustness of Neural Probabilistic Circuits (NPC) depends solely on the attribute recognition model and is independent of the probabilistic circuit. Building on this finding, it proposes RNPC, which achieves provably improved robustness via class-wise inference aggregation, significantly enhancing adversarial robustness while maintaining benign accuracy.
- Understanding Challenges to the Interpretation of Disaggregated Evaluations of AI
-
Through causal graphical modeling, this paper demonstrates that performance disparities across subgroups in disaggregated evaluations do not necessarily indicate unfairness, but may instead reflect natural consequences of distributional differences in the data-generating process. The authors recommend supplementing standard disaggregated evaluations with causal assumptions and weighted evaluation methods.
- Unifying Proportional Fairness in Centroid and Non-Centroid Clustering
-
This paper unifies the study of proportional fairness in centroid and non-centroid clustering under a single "semi-centroid clustering" framework, establishes an impossibility theorem showing the two cannot be simultaneously achieved, and designs novel algorithms that attain constant-factor core guarantees under dual metric loss.
- Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
-
Under the f-DP framework grounded in hypothesis testing, this paper provides a unified characterization of three classes of privacy risks in differential privacy — re-identification, attribute inference, and data reconstruction — yielding tighter and consistent risk upper bounds that enable a 20% reduction in noise without compromising security guarantees.
- Unlocking Transfer Learning for Open-World Few-Shot Recognition
-
A two-stage framework is proposed that combines open-set-aware meta-learning with open-set-free transfer learning, achieving the first successful application of the transfer learning paradigm to few-shot open-set recognition (FSOSR) and reaching SOTA on miniImageNet and tieredImageNet.