🎯 Object Detection¶

💬 ACL2026 · 5 paper notes

E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition: This paper proposes E2E-GMNER, the first end-to-end GMNER framework that unifies entity recognition, semantic classification, visual grounding, and implicit knowledge reasoning within a single multimodal large language model. The framework employs CoT reasoning to adaptively assess the utility of visual and knowledge cues, and introduces Gaussian Risk-aware Bounding box Perturbation (GRBP) to enhance the robustness of generative bounding box prediction.
Evaluating Memory Capability in Continuous Lifelog Scenario: This paper proposes LifeDialBench, a benchmark for evaluating memory capabilities in continuous lifelog scenarios, comprising EgoMem (7 days of real-world data) and LifeMem (1 year of simulated data). An online evaluation protocol is introduced to enforce temporal causality. Counterintuitively, a simple RAG baseline consistently outperforms all complex memory systems.
Evolutionary Negative Module Pruning for Better LoRA Merging: This paper proposes ENMP, a method that leverages evolutionary search to identify and prune "negative modules" that degrade performance during LoRA merging. Designed as a plug-and-play enhancement, ENMP consistently improves existing merging algorithms across both NLP and vision domains.
GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization: GigaCheck is proposed as a dual-strategy framework: document-level classification via fine-tuned LLM, and span-level detection that innovatively treats AI-generated text spans as "objects," employing a DETR-like architecture for end-to-end character-level localization.
Retrievals Can Be Detrimental: Unveiling the Backdoor Vulnerability of Retrieval-Augmented Diffusion Models: This paper proposes BadRDM, the first backdoor attack framework targeting retrieval-augmented diffusion models (RDMs). By maliciously fine-tuning the retriever via contrastive learning, it establishes a shortcut from trigger tokens to toxic proxy images, achieving attack success rates of 90.9% and 96.4% on class-conditional and text-to-image (T2I) tasks respectively, while preserving benign generation quality.