🔗 Causal Inference¶

📹 ICCV2025 · 2 paper notes

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets: This paper proposes a block-based diffusion method leveraging LLMs and diffusion models to automatically generate high-quality counterfactual image-text pair datasets, accompanied by a set-aware loss function. Without manual annotation, the approach significantly improves CLIP's compositional reasoning ability, surpassing state-of-the-art methods on ARO/VL-Checklist and other benchmarks with substantially less data.
Social Debiasing for Fair Multi-modal LLMs: This paper constructs CMSC, a large-scale counterfactual dataset spanning 18 social concepts, and proposes the Anti-Stereotype Debiasing (ASD) strategy—comprising bias-aware data resampling and a Social Fairness Loss—that effectively reduces social bias across four MLLM architectures with negligible degradation of general multimodal capability.