Skip to content

šŸ“ Optimization & Theory

šŸ’¬ ACL2025 Ā· 3 paper notes

šŸ“Œ Same area in other venues: šŸ“· CVPR2026 (22) Ā· šŸ”¬ ICLR2026 (222) Ā· 🧪 ICML2026 (88) Ā· šŸ¤– AAAI2026 (21) Ā· 🧠 NeurIPS2025 (126) Ā· šŸ“¹ ICCV2025 (7)

Aligned but Blind: Alignment Increases Implicit Bias by Reducing Awareness of Race

Reveals the "race-blindness" side-effect of alignment training: Alignment prevents LLMs from representing "black/white" as racial concepts in ambiguous contexts, thus failing to activate safety guardrails and causing implicit bias to surge from 64.1% to 91.4%. Counter-intuitively, injecting race-aware activations (rather than unlearning) in early layers reduces implicit bias from 97.3% to 42.4%.

AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

Proposes AmbiK, a text-only dataset dedicated to detecting ambiguous instructions in kitchen environments. It contains 1,000 pairs of ambiguous/unambiguous instructions categorized by three ambiguity types (user preference, common sense, and safety). Multiple conformal prediction-based ambiguity detection methods are evaluated, revealing that existing methods perform poorly on this benchmark.

ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

ScaleBiO proposes a fully first-order bilevel optimization algorithm based on penalty function reformulation, applying bilevel optimization to data source reweighting for 30B+ parameter LLMs for the first time, achieving improvements of +9% on GSM8K and +5.8% on MATH for Qwen-2.5-32B.