Skip to content

👥 Social Computing

🧪 ICML2026 · 8 paper notes

📌 Same area in other venues: 💬 ACL2026 (45) · 📷 CVPR2026 (5) · 🔬 ICLR2026 (10) · 🤖 AAAI2026 (11) · 🧠 NeurIPS2025 (18) · 📹 ICCV2025 (4)

🔥 Top topics: Alignment/RLHF ×2 · Multimodal/VLM ×2 · LLM ×2

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

This paper proposes alignment tampering: when the model to be aligned generates "high-quality but biased" and "low-quality but unbiased" responses, the pairwise preference labels in RLHF conflate quality with bias. This leads the reward model, PPO/DPO, and Best-of-N sampling to further amplify the undesired original biases.

FLIPS: Instance-Fingerprinting for LLMs via Pseudo-Random Sequences

FLIPS generates unique model "fingerprint responses" by designing pseudo-random seed sequences (seeds known only to the model owner). Even if an attacker fine-tunes or prunes the model, the fingerprint cannot be eliminated, achieving a detection rate \(> 99\%\) and a false positive rate \(< 1\%\) in black-box query scenarios.

IDO: Incongruity-Aware Distribution Optimization for Multimodal Fake News Detection

IDO significantly improves F1 by 3-7% over SOTA on Weibo / Twitter / Fakeddit and enhances generalization to unseen fake news by explicitly modeling inter-modal incongruity as a learnable distribution optimization objective—simultaneously pulling multimodal embeddings of real news closer and enlarging the incongruity of fake news.

MIND: Multi-Rationale Integrated Discriminative Reasoning Framework for Multi-Modal Fake News

MIND provides an explainable and robust discriminative framework for fake news detection through multi-view rationale generation + cross-rationale discriminative reasoning. By simultaneously utilizing three types of LLM-generated rationales—fact-checking, modal consistency, and semantic plausibility—it achieves a 4-8% F1 improvement over SOTA on Weibo, Twitter, and Fakeddit.

ObjEmbed: Towards Universal Multimodal Object Embeddings

ObjEmbed trains a universal object embedding model—aligning multimodal object representations by combining tasks such as detection, segmentation, retrieval, captioning, and classification. A single embedding outperforms or matches task-specific SOTA on 11 tasks, including OVD / OVS / Text2Image-Object / Open-Caption-Eval.

SCOPE: Selective Conformal Optimized Pairwise LLM Judging

SCOPE eliminates positional bias in LLM judging via Bidirectional Preference Entropy (BPE) and combines it with Conformal Risk Control to achieve finite-sample FDR control—providing statistically valid risk guarantees while maintaining high coverage (e.g., FDR of only 0.099 at 0.583 coverage vs. Vanilla FDR of 0.198 at 1.000 coverage).

Self-Debias: Self-correcting for Debiasing Large Language Models

Self-Debias reshapes the LLM debiasing problem as "fair resource allocation of probability mass on autoregressive reasoning chains." Using trajectory-level suffix margins as resource units, it employs Jain's fairness index to prevent resource collapse on easy samples. Combined with cold-start SFT and consistency-filtering-driven online self-training, it improves Qwen3-8B's average score across 8 fairness/utility benchmarks from 77.5 to 81.7 with only 20k labeled seeds, reversing the "self-correction collapse" of base models into a stable +0.4 improvement.

The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-modal Divergence

This paper employs a measure-theoretic framework to elevate the InfoNCE loss to a deterministic "population energy" over representation distributions. It proves that the unimodal case is convex and converges to a unique Gibbs equilibrium, while the symmetric multimodal case exhibits persistent negative symmetric KL coupling, which geometrically necessitates a modality gap.