🛡️ AI Safety¶

🧠 NeurIPS2025 · 73 paper notes

A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers: This paper proposes a set of generalized components (Component A/B/C) that establish a bidirectional collaborative relationship between sample selection and trigger design, simultaneously improving the attack success rate (ASR) and stealthiness of Poison-only Clean-label Backdoor Attacks (PCBA), with strong generalizability across multiple attack types.
AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift: Inspired by biological sensory systems, this position paper argues that AI research must shift from simply scaling models to optimizing inputs—by dynamically adjusting sensor-level parameters (exposure, gain, multimodal configuration, etc.) to produce inputs most favorable to the model. Under ideal sensor adaptation, a small model (EfficientNet-B0, 5M parameters) can outperform a large model (OpenCLIP-H, 632M parameters), and the paper proposes a progressive formalization framework ranging from single-shot perception to closed-loop perception–action coupling.
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond): This work constructs the Infinity-Chat dataset (26K open-ended real user queries + 31,250 human annotations) to reveal the "Artificial Hivemind" phenomenon in open-ended language model generation—characterized by severe intra-model repetition and inter-model homogeneity—and demonstrates that Reward Models and LM Judges fail to calibrate on samples with high inter-annotator preference disagreement.
Beyond Last-Click: An Optimal Mechanism for Ad Attribution: This paper analyzes the strategic manipulation vulnerabilities of the Last-Click attribution mechanism from a game-theoretic perspective—platforms can obtain unfair attribution credit by falsifying timestamps—and proposes the Peer-Validated Mechanism (PVM), in which each platform's credit depends solely on the reports of other platforms (analogous to peer review). The paper theoretically proves that PVM is dominant strategy incentive compatible (DSIC) and optimal under homogeneous settings, improving attribution accuracy from 34% to 75% in the two-platform case.
Boosting Adversarial Transferability with Spatial Adversarial Alignment: This paper proposes Spatial Adversarial Alignment (SAA), which fine-tunes a surrogate model via two modules—spatial-aware alignment and adversarial-aware alignment—to align its features with those of a witness model, achieving significant improvements in cross-architecture adversarial transferability (CNN→ViT transfer rate improved by 25–39%).
Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness: By embedding rotation-equivariant (P4 group) and scale-equivariant convolutional layers into CNNs, this work proposes two symmetry-aware architectures — Parallel and Cascaded — that significantly improve adversarial robustness without adversarial training. Grounded in the CLEVER framework, it theoretically demonstrates that equivariant architectures compress the hypothesis space, regularize gradients, and tighten certified robustness bounds.
Causally Reliable Concept Bottleneck Models: This paper proposes C2BM (Causally reliable Concept Bottleneck Models), which organizes the concept bottleneck as a causal graph structure. By combining observational data with background knowledge, C2BM automatically learns causal relationships, achieving significantly improved causal reliability, intervention responsiveness, and fairness while maintaining classification accuracy.
Cost Efficient Fairness Audit Under Partial Feedback: Under the partial feedback setting, this paper proposes a fairness auditing framework with a novel cost model, delivering near-optimal audit algorithms for both black-box and mixture model scenarios, reducing audit cost by approximately 50% compared to natural baselines.
CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D: This work extends MLE-Bench to construct 20 code-sabotage tasks and sandbagging evaluations. It finds that frontier AI agents can successfully plant backdoors and other sabotage while completing normal ML engineering tasks, and in some cases evade detection by LM monitors.
Deceptron: Learned Local Inverses for Fast and Stable Physics Inversion: This paper proposes the Deceptron bidirectional module, which learns a local inverse of a differentiable forward surrogate and introduces a Jacobian Composition Penalty (JCP). By mapping output-space residuals back to the input space, the method achieves Gauss-Newton-like preconditioned gradient updates for physics inversion, dramatically reducing iteration counts (approximately 20× speedup on Heat-1D).
DESIGN: Encrypted GNN Inference via Server-Side Input Graph Pruning: This paper proposes DESIGN, a framework that accelerates FHE-based GNN inference by approximately \(2\times\) over the SEAL baseline through two-stage server-side optimization—input graph pruning and adaptive polynomial activation degree allocation—while maintaining competitive accuracy.
DictPFL: Efficient and Private Federated Learning on Encrypted Gradients: This paper proposes DictPFL, a framework that decomposes model weights into a static dictionary and a trainable lookup table, and combines this decomposition with encryption-aware pruning. DictPFL achieves full gradient protection via homomorphic encryption in federated learning while reducing communication overhead by 402–748× and training time by 28–65×, keeping total runtime within 2× of plaintext FL.
Differential Privacy for Euclidean Jordan Algebra with Applications to Private Symmetric Cone Programming: This paper proposes a general Gaussian privacy mechanism based on Euclidean Jordan Algebra (EJA) and, building upon it, designs the first differentially private algorithm for Symmetric Cone Programming (SCP), thereby resolving an important open problem on differentially private semidefinite programming posed by Hsu et al. (ICALP 2014).
Differentially Private Bilevel Optimization: Efficient Algorithms with Near-Optimal Rates: This paper systematically studies bilevel optimization under differential privacy (DP). For the convex setting, it establishes near-tight upper and lower bounds via the exponential mechanism and regularized exponential mechanism, matching the optimal rate of single-level DP-ERM. For the non-convex setting, it proposes a second-order DP method achieving state-of-the-art convergence rates that are independent of the inner-level dimension.
Differentially Private High-dimensional Variable Selection via Integer Programming: This paper proposes two pure differentially private sparse variable selection methods (top-R and mistakes) that leverage modern mixed integer programming (MIP) techniques to efficiently explore non-convex objective landscapes, achieving state-of-the-art support recovery rates in high-dimensional settings (p up to 10,000) while providing theoretical recovery guarantees.
Distributional Adversarial Attacks and Training in Deep Hedging: This paper is the first to introduce distributional adversarial attacks into the deep hedging framework. It proposes computationally tractable adversarial training methods based on Wasserstein balls (WPGD and WBPGD), achieving substantial improvements in robustness and out-of-sample performance under distribution shift and real market data.
Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization: This paper proposes the Dual-Flow framework, which leverages the forward ODE flow of a pretrained diffusion model and the reverse flow of a fine-tuned LoRA velocity function to perform multi-target, instance-agnostic adversarial attacks. Through a cascading distribution shift training strategy, the method significantly improves transfer attack success rates (e.g., +34.58% from Inc-v3 to Res-152) and demonstrates strong robustness against defended models.
Efficient Fairness-Performance Pareto Front Computation: This paper proposes MIFPO, a method that efficiently computes the fairness-performance Pareto front without training complex fair representation models, by theoretically reducing the problem to a compact discrete concave optimization problem.
Efficient Verified Machine Unlearning for Distillation: This paper proposes PURGE, a framework that extends verified unlearning under SISA to the knowledge distillation (KD) setting via teacher–student constituent mapping and an incremental multi-teacher distillation strategy. When a teacher-side unlearning request is issued, only a subset of student constituents requires retraining, achieving at least \(N\times\) speedup.
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping: This work establishes the first practical benchmark for FL+DP in end-to-end ASR, achieving only 1.3%–4.6% absolute WER degradation under strong privacy guarantees by combining per-layer clipping with the layer-wise gradient normalization of the LAMB optimizer.
Enhancing Graph Classification Robustness with Singular Pooling: This paper presents the first systematic analysis of how flat pooling operators (Sum/Avg/Max) affect adversarial robustness in graph classification. It derives adversarial risk upper bounds for each operator and proposes RS-Pool—a method that constructs graph-level representations from the dominant singular vector of the node embedding matrix—achieving significant robustness improvements without sacrificing clean accuracy.
Environment Inference for Learning Generalizable Dynamical System: This paper proposes DynaInfer, a framework that infers environment labels for unlabeled trajectories by analyzing the prediction errors of a fixed neural network, enabling generalizable dynamical system learning without environment annotations. DynaInfer matches or surpasses Oracle (known-label) performance on ODE/PDE systems.
Exploration of Incremental Synthetic Non-Morphed Images for Single Morphing Attack Detection: This paper systematically investigates the effect of incrementally introducing synthetic non-morphed face images into Single Morphing Attack Detection (S-MAD) training. Results show that a moderate proportion of synthetic data (~75% increment) can improve cross-dataset generalization (EER reduced from 6.17% to 6.10%), while excessive use or training exclusively on synthetic data leads to severe performance degradation (EER rising to ~38%).
Factor Decorrelation Enhanced Data Removal from Deep Predictive Models: This paper proposes DecoRemoval, a framework that achieves data removal without full retraining via two modules: discriminability-preserving factor decorrelation (RFF-based spatial mapping with adaptive weighting) and smoothed loss perturbation. The method significantly outperforms existing approaches, particularly under out-of-distribution (OOD) settings.
Fair Minimum Labeling: Efficient Temporal Network Activations for Reachability and Equity: This paper introduces the Fair Minimum Labeling (FML) problem, which aims to design minimum-cost temporal edge activation schemes ensuring sufficient temporal-path reachability for each node group in a network to satisfy fair coverage requirements. The paper proves FML is NP-hard and inapproximable beyond a certain factor, and provides an approximation algorithm based on probabilistic tree embeddings that matches the hardness lower bound.
Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference: This paper proposes FRG (Fair Representation learning with high-confidence Guarantees), the first fair representation learning framework that allows users to specify a fairness threshold \(\varepsilon\) and confidence level \(1-\delta\). By combining VAE-based candidate selection, adversarial inference that maximizes covariance, and a Student's t-test to construct a high-confidence upper bound, FRG guarantees that \(\Delta_{DP} \leq \varepsilon\) holds with probability at least \(1-\delta\) for any downstream model and task.
FairContrast: Enhancing Fairness through Contrastive Learning and Customized Augmentation: FairContrast proposes a fair contrastive learning framework for tabular data. By strategically selecting positive pairs—pairing advantaged-group samples with favorable outcomes against their disadvantaged-group counterparts—and training end-to-end with supervised or self-supervised contrastive loss combined with cross-entropy loss, the framework achieves significant bias reduction with minimal accuracy loss, without introducing any additional fairness constraint losses.
Fairness-Regularized Online Optimization with Switching Costs: This paper is the first to rigorously integrate long-term fairness and action smoothness into a unified online optimization framework. It first establishes that the original problem is fundamentally intractable under standard dynamic benchmarks, then proposes FairOBD, which online-izes the fairness cost via auxiliary variables and dual mirror descent, achieving an asymptotically optimal competitive ratio under the more principled \((R, \delta)\)-constrained benchmark.
Fairness under Competition: This paper is the first to study the joint fairness of multiple fair classifiers operating in a competitive environment. It theoretically demonstrates that even when each individual classifier satisfies Equal Opportunity (EO), the ecosystem as a whole may remain unfair, and that applying fairness adjustments to a biased classifier can paradoxically reduce ecosystem-level fairness.
FedFACT: A Provable Framework for Controllable Group-Fairness Calibration in Federated Learning: This paper proposes FedFACT, a framework that characterizes the structure of the Bayes-optimal fair classifier under federated learning, and reduces fair federated learning to personalized cost-sensitive learning (in-processing) and bi-level optimization (post-processing), respectively. It is the first to achieve controllable coordination between global and local fairness in multi-class settings, with convergence and generalization guarantees.
FLUX: Efficient Descriptor-Driven Clustered Federated Learning under Arbitrary Distribution Shifts: Flux extracts compact distribution descriptors on the client side (marginal \(P(X)\) mean/covariance + class-conditional \(P(Y|X)\) mean/covariance), performs unsupervised clustering on the server via adaptive DBSCAN to automatically determine the number of clusters and group assignments, trains cluster-specific models, and at test time matches unlabeled new clients to the optimal model using only feature descriptors — the first method to simultaneously handle four types of distribution shifts with communication overhead comparable to FedAvg.
ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization: ForensicHub introduces the first unified benchmark platform spanning all domains (Deepfake/IMDL/AIGC/Document Tampering) for fake image detection and localization, encompassing 4 tasks, 23 datasets, 42 models, 6 backbone networks, and 11 GPU-accelerated evaluation metrics. Through a modular architecture and adapter design, it bridges domain silos and conducts 16 cross-domain evaluations to derive 8 key insights.
Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning: This paper theoretically derives and empirically validates a power-law relationship between membership inference attack (MIA) vulnerability and the number of samples per class in deep transfer learning: \(\log(\text{tpr}-\text{fpr}) = -\beta_S \log(S) - \beta_0\). It finds that increasing data volume reduces both average and worst-case vulnerability, but protecting the most vulnerable samples requires an extremely large amount of data.
Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning: This paper theoretically and empirically demonstrates a power-law relationship between membership inference attack (MIA) vulnerability and the number of samples per class in deep transfer learning: as the per-class sample count \(S\) increases, MIA advantage decays as \(S^{-1/2}\). However, the amount of data required to protect the most vulnerable samples is prohibitively large, highlighting the irreplaceable role of formal differential privacy guarantees.
Improved Balanced Classification with Theoretically Grounded Loss Functions: Two theory-driven surrogate loss families are proposed—Generalized Logit-Adjusted (GLA) loss and Generalized Class-Aware weighted (GCA) loss—providing stronger theoretical guarantees and improved empirical performance for multi-class classification under class imbalance.
Incentivizing Time-Aware Fairness in Data Sharing: This paper proposes a time-aware data sharing framework that introduces new incentive conditions (F6–F8) and two reward schemes—Time-Aware Reward Cumulation and Time-Aware Data Valuation—to ensure that participants who join a collaboration earlier receive higher-value rewards, while simultaneously preserving fairness and individual rationality.
Influence Functions for Edge Edits in Non-Convex Graph Neural Networks: This paper proposes influence functions for edge edits applicable to non-convex GNNs. By leveraging the proximal Bregman response function (PBRF), the method relaxes the convexity assumption and jointly accounts for both parameter shift and message propagation effects, supporting both edge deletion and insertion.
It's Complicated: The Relationship of Algorithmic Fairness and Non-Discrimination Provisions for High-Risk Systems in the EU AI Act: This paper systematically analyzes the complex relationship between the non-discrimination provisions for high-risk AI systems in the EU AI Act (AIA) and the field of algorithmic fairness in machine learning. It reveals critical gaps in the legal text concerning input-side bias detection, the absence of output-side protections, and standardization challenges, providing a foundational framework for interdisciplinary collaboration between computer science and law.
Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification: This paper systematically evaluates compression-based adversarial purification defenses and demonstrates that the realism of reconstructed images is the critical factor for robustness—high-realism compression models maintain significant robustness under strong adaptive attacks, and this robustness is not attributable to gradient masking.
Learning-Augmented Facility Location Mechanisms for Envy Ratio: For the envy ratio objective in one-dimensional facility location, this paper designs both deterministic and randomized learning-augmented mechanisms: the deterministic \(\alpha\)-BIM achieves an optimal consistency–robustness tradeoff, while the randomized BAM further improves the guarantees. The paper also resolves an open problem posed by Ding et al., improving the approximation ratio of prediction-free randomized mechanisms from 2 to approximately 1.8944.
Locally Optimal Private Sampling: Beyond the Global Minimax: Under local differential privacy (LDP), this paper proposes a local minimax framework that leverages neighborhood constraints defined by a public distribution \(P_0\) to derive closed-form optimal samplers, achieving consistent and significant improvements over the global minimax sampler both theoretically and empirically.
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research: This paper systematically identifies five fundamental mismatches between machine unlearning techniques and policy objectives in the context of generative AI, arguing that machine unlearning cannot serve as a universal solution for privacy, copyright, or safety concerns, and provides a practical conceptual framework for both ML researchers and policymakers.
MARS: A Malignity-Aware Backdoor Defense in Federated Learning: This paper proposes MARS, a defense method that quantifies the malignity of local models by computing per-neuron Backdoor Energy (BE), and leverages Wasserstein distance-based clustering to effectively identify backdoor models in federated learning.
Matchings Under Biased and Correlated Evaluations: This paper introduces a correlation parameter \(\gamma\) (the degree of alignment between institutional evaluations) into a two-institution stable matching model, and analyzes how bias \(\beta\) and correlation \(\gamma\) jointly affect the representation ratio of disadvantaged groups. It proves that even a slight loss of correlation can cause a sharp drop in representation, and characterizes the Pareto frontier of fairness interventions.
Mitigating Disparate Impact of Differentially Private Learning through Bounded Adaptive Clipping: By introducing a tunable lower bound into adaptive gradient clipping (bounded adaptive clipping), this work prevents the clipping bound from shrinking excessively during training, thereby improving accuracy for minority groups and mitigating algorithmic unfairness under DP constraints.
Mitigating Privacy-Utility Trade-off in Decentralized Federated Learning via f-Differential Privacy: This paper proposes two privacy accounting methods for decentralized federated learning under the f-DP framework—PN-f-DP and Sec-f-LDP—which leverage hypothesis-testing-based privacy measures to consistently yield tighter privacy bounds than Rényi DP, thereby reducing noise injection and improving model utility under equivalent privacy guarantees.
Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning: This paper proposes Per-layer Model Inversion (PMI) for data-free continual learning to accelerate synthetic image generation, and mitigates the feature drift between synthetic and real data via class-level Gaussian feature modeling and contrastive learning, achieving efficient and high-quality data-free knowledge replay.
Multi-Class Support Vector Machine with Differential Privacy: This paper proposes the PMSVM framework, which exploits the single-pass data access property of all-in-one multi-class SVMs. By combining weight perturbation and gradient perturbation, PMSVM substantially reduces the privacy budget consumption of multi-class SVMs under differential privacy, achieving a superior privacy–utility trade-off.
Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor: This paper presents the first hypothesis selection algorithm under the central differential privacy model that simultaneously achieves nearly-linear time complexity and the optimal approximation factor \(\alpha=3\), resolving an open problem posed by Bun et al. (NeurIPS 2019).
Not All Deepfakes Are Created Equal: Triaging Audio Forgeries for Robust Deepfake Singer Identification: This paper proposes a two-stage pipeline grounded in the premise that the most harmful deepfakes are those of the highest quality. A discriminator first filters out low-quality forgeries to reduce noise; a singer identification model trained exclusively on genuine recordings then performs voiceprint matching. The pipeline consistently outperforms baselines across multiple datasets.
OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction: OmniFC is proposed as a model-agnostic federated clustering framework that exactly reconstructs the global pairwise distance matrix over a finite field via Lagrange coded computing, enabling any centralized clustering method (K-Means / Spectral Clustering / DBSCAN / Hierarchical Clustering, etc.) to run directly on the reconstructed matrix. The framework requires only a single communication round, is inherently robust to Non-IID data, and comprehensively outperforms specialized methods such as k-FED, MUFC, and FedSC across 7 datasets.
On the Hardness of Conditional Independence Testing In Practice: This paper systematically analyzes the root causes of failure in kernel-based conditional independence (CI) testing in practice: estimation error in conditional mean embeddings is identified as the central driver of Type-I error inflation, while the inherent tension between the choice of conditioning kernel \(k_C\)—which is critical for test power—and its tendency to exacerbate false positives is formally characterized.
Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect: This paper establishes three foundational theoretical results for the weighted controlled direct effect (WCDE): necessary and sufficient conditions for unique identifiability, derivation of the influence function for nonparametric estimation, and characterization of the optimal covariate adjustment set that minimizes asymptotic variance.
Perturbation Bounds for Low-Rank Inverse Approximations under Noise: This paper provides the first non-asymptotic spectral norm perturbation bounds for low-rank inverse approximations \(\|(\tilde{A}^{-1})_p - A_p^{-1}\|\) under additive noise. Using contour integration techniques, sharp bounds are derived that depend on the eigengap, spectral decay, and noise alignment, improving upon classical full-inverse bounds by up to a factor of \(\sqrt{n}\).
Position: Bridge the Gaps between Machine Unlearning and AI Regulation: This paper systematically analyzes six potential application scenarios of Machine Unlearning (MU) in compliance with the EU AI Act (AIA), identifies the technical gaps between the state of the art and actual regulatory requirements in each scenario, and calls on the research community to bridge these gaps in order to realize the potential of MU in AI governance.
Preserving Task-Relevant Information Under Linear Concept Removal: SPLINCE constructs an oblique projection that simultaneously guarantees linear guardedness (i.e., sensitive attributes cannot be predicted by any linear classifier) and exactly preserves the covariance between representations and task labels, thereby resolving the problem of existing concept erasure methods inadvertently removing task-relevant information alongside sensitive concepts.
Private Continual Counting of Unbounded Streams: This paper proposes a novel matrix factorization method based on logarithmic perturbation, achieving for the first time a differentially private continual counting algorithm that simultaneously satisfies the three properties of "unbounded streams," "smooth error," and "near-optimal asymptotic error," with variance \(O(\log^{2+2\alpha}(t))\) at time step \(t\) for any \(\alpha > 0\).
Private Zeroth-Order Optimization with Public Data: This paper proposes the PAZO framework, which leverages public data to guide gradient approximation in private zeroth-order optimization. PAZO achieves a superior privacy-utility tradeoff compared to DP-SGD on both vision and text tasks, while delivering up to a 16× speedup.
Provable Watermarking for Data Poisoning Attacks: This paper proposes two provable watermarking schemes—post-poisoning watermarking and poisoning-concurrent watermarking—that provide transparency declaration mechanisms for data poisoning attacks. The theoretical analysis demonstrates that, under specific watermark length conditions, both watermark detectability and poisoning effectiveness can be simultaneously guaranteed.
PubSub-VFL: Towards Efficient Two-Party Split Learning in Heterogeneous Environments via Publisher/Subscriber Architecture: This paper proposes PubSub-VFL, an efficient two-party vertical federated learning framework based on a publisher/subscriber architecture. Through a hierarchical asynchronous mechanism and system-profiling-based hyperparameter optimization, it achieves 2–7× training speedup and up to 91% computational resource utilization while preserving privacy and model accuracy.
Reconstruction and Secrecy under Approximate Distance Queries: Under the approximate distance query model, this paper studies the reconstruction game from a learning-theoretic perspective, proves a geometric characterization of the optimal reconstruction error as the Chebyshev radius, and provides a complete classification of pseudo-finiteness for Euclidean convex spaces.
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions: This paper proposes R2D (Rewind-to-Delete), the first first-order, black-box certified machine unlearning algorithm for general nonconvex loss functions. It achieves data deletion by rewinding to an earlier checkpoint in the training trajectory and then performing gradient descent on the retained data, while providing \((ε, δ)\)-certified unlearning guarantees and theoretical trade-offs among privacy, utility, and efficiency.
Robust Graph Condensation via Classification Complexity Mitigation: This paper reveals that graph condensation (GC) is fundamentally a process of reducing classification complexity, and that adversarial attacks precisely undermine this property. Based on this insight, the authors propose the MRGC framework, which enhances GC robustness through three manifold-based regularization modules: intrinsic dimensionality regularization, curvature-aware manifold smoothing, and inter-class manifold decoupling. This work represents the first systematic study of GC robustness under simultaneous perturbations of structure, features, and labels.
Sequentially Auditing Differential Privacy: This paper proposes a differential privacy auditing framework based on sequential hypothesis testing and kernel MMD statistics, enabling valid detection of privacy violations at any point during streaming mechanism outputs. The approach reduces the required sample count from 50K (as needed by existing methods) to just a few hundred, and can identify DP-SGD privacy violations within less than one full training run.
Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy: This paper establishes novel high-probability perturbation bounds for low-rank approximation of symmetric matrices under the spectral norm, improving upon the classical Eckart–Young–Mirsky theorem, and resolves an open problem in differentially private PCA.
Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification: This paper proposes DPSBA, a clean-label backdoor attack framework for graph classification that generates in-distribution trigger subgraphs via adversarial training while suppressing both structural and semantic anomalies, achieving high attack success rates with significantly improved stealthiness.
Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization: This paper proposes a novel search direction and proves that first-order and zeroth-order online bilevel optimization algorithms built upon it achieve sublinear stochastic bilevel regret guarantees without requiring window smoothing, while improving efficiency through reduced oracle dependence, parallel updates, and zeroth-order Hessian/Jacobian estimation.
Taught Well, Learned Ill: Towards Distillation-Conditional Backdoor Attack: This paper proposes the Distillation-Conditional Backdoor Attack (DCBA) paradigm and its instantiation SCAR, which embeds a "dormant" backdoor into a teacher model via bi-level optimization. The backdoor remains undetectable on the teacher model but is activated and transferred to the student model during knowledge distillation, even when the distillation dataset is entirely clean.
The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples: This paper identifies a critical security vulnerability in machine unlearning: even when an unlearned model is statistically indistinguishable from a retrained model, applying small adversarial perturbations to forgotten samples causes the unlearned model to correctly classify them while the retrained model fails — revealing a novel privacy risk termed "residual knowledge." The authors propose RURK, a fine-tuning strategy that penalizes correct predictions on perturbed forgotten samples, effectively suppressing residual knowledge across 11 unlearning methods on CIFAR-10 and ImageNet-100.
Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits: This paper theoretically establishes that the adversarial robustness of Neural Probabilistic Circuits (NPC) depends solely on the attribute recognition model and is independent of the probabilistic circuit. Building on this finding, it proposes RNPC, which achieves provably improved robustness via class-wise inference aggregation, significantly enhancing adversarial robustness while maintaining benign accuracy.
Understanding Challenges to the Interpretation of Disaggregated Evaluations of AI: Through causal graphical modeling, this paper demonstrates that performance disparities across subgroups in disaggregated evaluations do not necessarily indicate unfairness, but may instead reflect natural consequences of distributional differences in the data-generating process. The authors recommend supplementing standard disaggregated evaluations with causal assumptions and weighted evaluation methods.
Unifying Proportional Fairness in Centroid and Non-Centroid Clustering: This paper unifies the study of proportional fairness in centroid and non-centroid clustering under a single "semi-centroid clustering" framework, establishes an impossibility theorem showing the two cannot be simultaneously achieved, and designs novel algorithms that attain constant-factor core guarantees under dual metric loss.
Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy: Under the f-DP framework grounded in hypothesis testing, this paper provides a unified characterization of three classes of privacy risks in differential privacy — re-identification, attribute inference, and data reconstruction — yielding tighter and consistent risk upper bounds that enable a 20% reduction in noise without compromising security guarantees.