🔍 Information Retrieval & RAG¶

🎞️ ECCV2024 · 3 paper notes

Multi-Label Cluster Discrimination for Visual Representation Learning: This work proposes MLCD (Multi-Label Cluster Discrimination), which assigns multiple cluster pseudo-labels to each image and designs a disambiguated multi-label classification loss. Pre-trained on LAION-400M, the ViT model under MLCD comprehensively outperforms OpenCLIP, FLIP, and UNICOM in linear probe, zero-shot classification, and retrieval tasks.
OneRestore: A Universal Restoration Framework for Composite Degradation: OneRestore is proposed as a Transformer-based universal image restoration framework. Driven by a scene-descriptor-guided cross-attention mechanism and a composite degradation restoration loss, it adaptively handles low-light, haze, rain, snow, and their arbitrary composite combinations within a single model, supporting controllable restoration under both text and visual modes.
Towards Open-Ended Visual Recognition with Large Language Model: This paper proposes the OmniScient Model (OSM)—a generative mask classifier based on a frozen CLIP-ViT, a trainable MaskQ-Former, and a frozen LLM (Vicuna-7B). It shifts visual recognition from "selecting categories from a predefined vocabulary" to "directly generating category names," eliminating the dependency on predefined vocabularies during both training and testing. It outperforms DaTaSeg by +4.3 PQ on COCO panoptic segmentation.