👥 Social Computing¶
🧪 ICML2026 · 8 paper notes
📌 Same area in other venues: 💬 ACL2026 (45) · 📷 CVPR2026 (5) · 🔬 ICLR2026 (10) · 🤖 AAAI2026 (11) · 🧠 NeurIPS2025 (18) · 📹 ICCV2025 (4)
🔥 Top topics: Alignment/RLHF ×2 · Multimodal/VLM ×2 · LLM ×2
- Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
-
This paper proposes alignment tampering: when the model to be aligned generates "high-quality but biased" and "low-quality but unbiased" responses, the pairwise preference labels in RLHF conflate quality with bias. This leads the reward model, PPO/DPO, and Best-of-N sampling to further amplify the undesired original biases.
- FLIPS: Instance-Fingerprinting for LLMs via Pseudo-Random Sequences
-
FLIPS generates unique model "fingerprint responses" by designing pseudo-random seed sequences (seeds known only to the model owner). Even if an attacker fine-tunes or prunes the model, the fingerprint cannot be eliminated, achieving a detection rate \(> 99\%\) and a false positive rate \(< 1\%\) in black-box query scenarios.
- IDO: Incongruity-Aware Distribution Optimization for Multimodal Fake News Detection
-
IDO significantly improves F1 by 3-7% over SOTA on Weibo / Twitter / Fakeddit and enhances generalization to unseen fake news by explicitly modeling inter-modal incongruity as a learnable distribution optimization objective—simultaneously pulling multimodal embeddings of real news closer and enlarging the incongruity of fake news.
- MIND: Multi-Rationale Integrated Discriminative Reasoning Framework for Multi-Modal Fake News
-
MIND provides an explainable and robust discriminative framework for fake news detection through multi-view rationale generation + cross-rationale discriminative reasoning. By simultaneously utilizing three types of LLM-generated rationales—fact-checking, modal consistency, and semantic plausibility—it achieves a 4-8% F1 improvement over SOTA on Weibo, Twitter, and Fakeddit.
- ObjEmbed: Towards Universal Multimodal Object Embeddings
-
ObjEmbed trains a universal object embedding model—aligning multimodal object representations by combining tasks such as detection, segmentation, retrieval, captioning, and classification. A single embedding outperforms or matches task-specific SOTA on 11 tasks, including OVD / OVS / Text2Image-Object / Open-Caption-Eval.
- SCOPE: Selective Conformal Optimized Pairwise LLM Judging
-
SCOPE eliminates positional bias in LLM judging via Bidirectional Preference Entropy (BPE) and combines it with Conformal Risk Control to achieve finite-sample FDR control—providing statistically valid risk guarantees while maintaining high coverage (e.g., FDR of only 0.099 at 0.583 coverage vs. Vanilla FDR of 0.198 at 1.000 coverage).
- Self-Debias: Self-correcting for Debiasing Large Language Models
-
Self-Debias reshapes the LLM debiasing problem as "fair resource allocation of probability mass on autoregressive reasoning chains." Using trajectory-level suffix margins as resource units, it employs Jain's fairness index to prevent resource collapse on easy samples. Combined with cold-start SFT and consistency-filtering-driven online self-training, it improves Qwen3-8B's average score across 8 fairness/utility benchmarks from 77.5 to 81.7 with only 20k labeled seeds, reversing the "self-correction collapse" of base models into a stable +0.4 improvement.
- The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-modal Divergence
-
This paper employs a measure-theoretic framework to elevate the InfoNCE loss to a deterministic "population energy" over representation distributions. It proves that the unimodal case is convex and converges to a unique Gibbs equilibrium, while the symmetric multimodal case exhibits persistent negative symmetric KL coupling, which geometrically necessitates a modality gap.