Skip to content

🎯 Object Detection

💬 ACL2026 · 5 paper notes

E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition

This paper proposes E2E-GMNER, the first end-to-end GMNER framework that unifies entity recognition, semantic classification, visual grounding, and implicit knowledge reasoning within a single multimodal large language model. The framework employs CoT reasoning to adaptively assess the utility of visual and knowledge cues, and introduces Gaussian Risk-aware Bounding box Perturbation (GRBP) to enhance the robustness of generative bounding box prediction.

Evaluating Memory Capability in Continuous Lifelog Scenario

This paper proposes LifeDialBench, a benchmark for evaluating memory capabilities in continuous lifelog scenarios, comprising EgoMem (7 days of real-world data) and LifeMem (1 year of simulated data). An online evaluation protocol is introduced to enforce temporal causality. Counterintuitively, a simple RAG baseline consistently outperforms all complex memory systems.

Evolutionary Negative Module Pruning for Better LoRA Merging

This paper proposes ENMP, a method that leverages evolutionary search to identify and prune "negative modules" that degrade performance during LoRA merging. Designed as a plug-and-play enhancement, ENMP consistently improves existing merging algorithms across both NLP and vision domains.

GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization

GigaCheck is proposed as a dual-strategy framework: document-level classification via fine-tuned LLM, and span-level detection that innovatively treats AI-generated text spans as "objects," employing a DETR-like architecture for end-to-end character-level localization.

Retrievals Can Be Detrimental: Unveiling the Backdoor Vulnerability of Retrieval-Augmented Diffusion Models

This paper proposes BadRDM, the first backdoor attack framework targeting retrieval-augmented diffusion models (RDMs). By maliciously fine-tuning the retriever via contrastive learning, it establishes a shortcut from trigger tokens to toxic proxy images, achieving attack success rates of 90.9% and 96.4% on class-conditional and text-to-image (T2I) tasks respectively, while preserving benign generation quality.