👥 Social Computing¶

🧪 ICML2026 · 9 paper notes

📌 Same area in other venues: 📷 CVPR2026 (3) · 🔬 ICLR2026 (17) · 💬 ACL2026 (45) · 🤖 AAAI2026 (10) · 🧠 NeurIPS2025 (20) · 📹 ICCV2025 (4)

🔥 Top topics: Alignment/RLHF ×2 · Multimodal/VLM ×2 · LLM ×2

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases: This paper proposes alignment tampering: when a model to be aligned generates "high-quality but biased" and "low-quality but unbiased" responses, the pairwise preference labels in RLHF conflate quality with bias. This causes the reward model, PPO/DPO, and Best-of-N sampling to further amplify unwanted biases.
FLIPS: Instance-Fingerprinting for LLMs via Pseudo-Random Sequences: FLIPS generates unique model "fingerprint responses" by designing pseudo-random seed sequences known only to the model owner. The fingerprint remains detectable (detection rate > 99%, false positive rate < 1%) under black-box query scenarios even if the attacker fine-tunes or prunes the model.
IDO: Incongruity-Aware Distribution Optimization for Multimodal Fake News Detection: IDO leverages explicit modeling of cross-modal incongruity as a learnable distribution optimization target—simultaneously pulling multimodal embeddings of real news closer while pushing the incongruity of fake news further apart. Ours achieves a 3-7% F1 Gain over Prev. SOTA on Weibo / Twitter / Fakeddit and significantly enhances generalization to unseen fake news.
MIND: Multi-Rationale Integrated Discriminative Reasoning Framework for Multi-Modal Fake News: MIND provides an explainable and robust discriminative framework for fake news detection through multi-view rationale generation + cross-rationale discriminative reasoning. By simultaneously leveraging three types of LLM-generated rationales—fact-checking, modal consistency, and semantic plausibility—it achieves a 4-8% F1 improvement over SOTA on Weibo, Twitter, and Fakeddit.
ObjEmbed: Towards Universal Multimodal Object Embeddings: ObjEmbed trains a universal object embedding model—by aligning multimodal object representations through a combination of tasks including detection, segmentation, retrieval, captioning, and classification. A single embedding exceeds or matches task-specific SOTA across 11 tasks, such as OVD, OVS, Text2Image-Object, and Open-Caption-Eval.
SCOPE: Selective Conformal Optimized Pairwise LLM Judging: SCOPE eliminates position bias in LLM judging through Bidirectional Preference Entropy (BPE) and implements finite-sample FDR control via Conformal Risk Control—providing statistically valid risk guarantees while maintaining high coverage (FDR is only 0.099 at 0.583 coverage vs. Vanilla FDR of 0.198 at 1.000 coverage).
Self-Debias: Self-correcting for Debiasing Large Language Models: Self-Debias reframes the LLM debiasing problem as "fair resource allocation of probability mass over autoregressive reasoning chains." Using trajectory-level suffix margins as resource units and the Jain Fairness Index to prevent budget collapse on easy samples, combined with cold-start SFT and consistency-filtered online self-training, the method improves Qwen3-8B's average score across 8 fairness/utility benchmarks from 77.5 to 81.7 using only 20k labeled seeds. It flips the base model's tendency to "correct toward bias" (collapse) into a stable +0.4 gain.
The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-modal Divergence: This paper employs a measure-theoretic framework to elevate the InfoNCE loss to a deterministic "population energy" over representation distributions. It demonstrates that the unimodal case is convex and converges to a unique Gibbs equilibrium, whereas the symmetric multimodal case exhibits a persistent negative symmetric KL coupling, showing that a modality gap is a geometric necessity.
Three Years of r/ChatGPT: Societal Impact Evaluations from Social Media Data: The study analyzes 137,000 posts from the r/ChatGPT subreddit over three years (2022-12 to 2025-11) by decomposing them into interpretable features using Sparse Autoencoders (SAE). By fitting piecewise linear changepoints to track the temporal trajectory of each feature, researchers found that "emotional usage" (therapy, emotional attachment) surged following the release of GPT-4o. Furthermore, the proposed online monitoring algorithm, PuLSE, demonstrated that it could have triggered alerts in October 2024—six months before OpenAI publicly acknowledged these impacts.