👥 Social Computing¶

📷 CVPR2026 · 3 paper notes

Bridging Pixels and Words: Mask-Aware Local Semantic Fusion for Multimodal Media Verification: The MaLSF framework is proposed, utilizing mask-label pairs as semantic anchors to achieve active local semantic conflict detection through Bidirectional Cross-modal Verification (BCV) and Hierarchical Semantic Aggregation (HSA) modules, achieving SOTA on DGM4 and fake news detection tasks.
Instance-level Visual Active Tracking with Occlusion-Aware Planning: OA-VAT constructs discriminative "instance prototypes" offline from a single reference image to resist similar distractors. It utilizes online EMA-enhanced prototypes and confidence-adaptive Kalman filtering to maintain stable tracking, while training a target-box-conditioned diffusion trajectory planner to actively bypass obstacles and recover the target upon occlusion—achieving an average SR of 0.93 on UnrealCV, 90.8% average CAR on real images, and 81.6% TSR on real UAVs, reaching 35 FPS on an RTX 3090.
Revisiting Unknowns: Towards Effective and Efficient Open-Set Active Learning: Ours proposes E2OAL, a detector-free open-set active learning framework that discovers latent structures of unknown classes via label-guided clustering, jointly models known and unknown categories using a Dirichlet-calibrated auxiliary head, and designs a two-stage adaptive querying strategy to simultaneously achieve high accuracy, high query purity, and high training efficiency across multiple benchmarks.