š Optimization & Theory¶
š¬ ACL2025 Ā· 3 paper notes
š Same area in other venues: š· CVPR2026 (22) Ā· š¬ ICLR2026 (222) Ā· š§Ŗ ICML2026 (88) Ā· š¤ AAAI2026 (21) Ā· š§ NeurIPS2025 (126) Ā· š¹ ICCV2025 (7)
- Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race
-
Reveals the "race-blindness" side-effect of alignment training: Alignment prevents LLMs from representing "black/white" as racial concepts in ambiguous contexts, thus failing to activate safety guardrails and causing implicit bias to surge from 64.1% to 91.4%. Counter-intuitively, injecting race-aware activations (rather than unlearning) in early layers reduces implicit bias from 97.3% to 42.4%.
- AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment
-
Proposes AmbiK, a text-only dataset dedicated to detecting ambiguous instructions in kitchen environments. It contains 1,000 pairs of ambiguous/unambiguous instructions categorized by three ambiguity types (user preference, common sense, and safety). Multiple conformal prediction-based ambiguity detection methods are evaluated, revealing that existing methods perform poorly on this benchmark.
- ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting
-
ScaleBiO proposes a fully first-order bilevel optimization algorithm based on penalty function reformulation, applying bilevel optimization to data source reweighting for 30B+ parameter LLMs for the first time, achieving improvements of +9% on GSM8K and +5.8% on MATH for Qwen-2.5-32B.