🔄 Self-Supervised Learning¶
💬 ACL2026 · 1 paper notes
📌 Same area in other venues: 📷 CVPR2026 (89) · 🔬 ICLR2026 (81) · 🧪 ICML2026 (28) · 🤖 AAAI2026 (16) · 🧠 NeurIPS2025 (33) · 📹 ICCV2025 (13)
- LLMSurgeon: Diagnosing Data Mixture of Large Language Models
-
LLMSurgeon formalizes the question "what data was this LLM trained on" as Data Mixture Surgery. By using the soft confusion matrix of a proxy classifier to invert the domain distribution within generated text, it estimates pre-training data mixture proportions while only requiring access to model outputs.