👥 Social Computing¶
🧪 ICML2025 · 6 paper notes
📌 Same area in other venues: 📷 CVPR2026 (3) · 🔬 ICLR2026 (17) · 💬 ACL2026 (45) · 🧪 ICML2026 (9) · 🤖 AAAI2026 (10) · 🧠 NeurIPS2025 (20)
🔥 Top topics: LLM ×2
- DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
-
DEFAME is proposed, which is a modular, zero-shot multimodal LLM pipeline. By using a six-stage dynamic workflow (Plan -> Execute -> Summarize -> Develop -> Predict -> Justify) combined with external multimodal tool retrieval for evidence, it achieves end-to-end joint text-image fact-checking, reaching new SOTA performance on three benchmarks: AVeriTeC, MOCHEG, and VERITE.
- Dynamical Phases of Short-Term Memory Mechanisms in RNNs
-
This work discovers two distinct underlying dynamical mechanisms supporting short-term memory in RNNs—slow-point manifolds and limit cycles. It analytically derives the power-law scaling laws of their maximum learnable learning rates using toy models (SP: \(\beta\) approx. 4-5 vs LC: \(\beta\) approx. 2-3), and provides large-scale empirical validation by training approximately 80,000 RNNs.
- Learning Survival Distributions with the Asymmetric Laplace Distribution
-
This paper proposes a parametric survival analysis method based on the asymmetric Laplace distribution (ALD). By using a neural network to learn the three parameters of the ALD (location, scale, and asymmetry), it achieves continuous, closed-form estimation of the survival distribution, comprehensively outperforming existing parametric and non-parametric approaches in both discriminative and calibration performance.
- OR-Bench: An Over-Refusal Benchmark for Large Language Models
-
This work proposes OR-Bench, the first large-scale over-refusal benchmark for LLMs. It contains 80K safe prompts that are prone to being falsely refused, revealing a strong trade-off between safety and over-refusal with a Spearman correlation coefficient of up to 0.89.
- Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
-
This paper proposes the GETA framework, which integrates Computerized Adaptive Testing (CAT) from psychometrics with Automatic Item Generation (AIG). Utilizing a variational IRT model and an LLM-driven item generator, GETA dynamically probes the value boundaries of LLMs to address the "evaluation chronoeffect" (data leakage and difficulty saturation) inherent in static benchmarks.
- When Bad Data Leads to Good Models
-
This paper proposes a "pre-training/post-training co-design" perspective, demonstrating through controlled experiments that incorporating a moderate amount of toxic data (~10%) into pre-training data actually reduces the entanglement of toxic features. This makes the model easier to detoxify during post-training (e.g., via ITI activation steering), ultimately reducing toxicity on ToxiGen from 41.40 to 2.63 while maintaining language capabilities.