Skip to content

🤖 Robotics & Embodied AI

💬 ACL2026 · 8 paper notes

Can AI-Generated Persuasion Be Detected? Persuaficial Benchmark and AI vs. Human Linguistic Differences

Persuaficial is a high-quality multilingual benchmark covering six languages for AI-generated persuasive text. Systematic evaluation reveals that subtle AI persuasion is harder to detect than human persuasion (F1 drops ~20%), while intensified persuasion is paradoxically easier to detect.

Debating the Unspoken: Role-Anchored Multi-Agent Reasoning for Half-Truth Detection

RADAR uses role-anchored (politician vs scientist) multi-agent debate to detect half-truths — statements that are factually correct but misleading due to omitted context — with dual-threshold adaptive early stopping, consistently outperforming single-agent and traditional multi-agent baselines under noisy retrieval conditions.

DeCoVec: Building Decoding Space based Task Vector for Large Language Models via In-Context Learning

DeCoVec constructs task vectors in the decoding space (output logits) by contrasting few-shot and zero-shot logit distributions: \(\mathbf{v}_\mathcal{T}^t = \mathbf{z}_{\text{icl}}^t - \mathbf{z}_{\text{zs}}^t\), injecting them into decoding via \(\tilde{\mathbf{z}}^t = \mathbf{z}_{\text{de}}^t + \lambda \cdot \mathbf{v}_\mathcal{T}^t\), achieving up to +5.50 average accuracy improvement over standard few-shot baselines across 7 LLMs without any training.

GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models

GRASPrune proposes a globally budget-constrained structured pruning framework that enforces hard mask budget constraints at every training step via Projected Straight-Through Estimator (Projected STE), jointly pruning FFN channels and KV head groups, achieving 12.18 PPL at 50% parameter retention on LLaMA-2-7B with only 6 minutes of single A100 training.

On Safety Risks in Experience-Driven Self-Evolving Agents

This paper systematically studies safety risks of experience-driven self-evolving agents, finding that even experience accumulated solely from harmless tasks causes significant safety degradation (ASR increases 13-49%). The root cause is the execution-oriented nature of accumulated experience, which reinforces action-taking over refusal behaviors.

Reasoning Hijacking: The Fragility of Reasoning Alignment in Large Language Models

This paper introduces "Reasoning Hijacking," a new attack paradigm that manipulates LLM reasoning logic by injecting false decision criteria into the data channel rather than changing task goals, achieving high attack success rates while bypassing intent-detection-based defenses.

VLN-NF: Feasibility-Aware Vision-and-Language Navigation with False-Premise Instructions

VLN-NF is the first benchmark requiring VLN agents to identify false-premise instructions and output NOT-FOUND in 3D partially observable environments. The paper also proposes REV-SPL evaluation metric and ROAM two-stage hybrid framework, achieving 6.1 REV-SPL (+45% over supervised baselines).

XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants

This paper reveals a design vulnerability in AI coding assistants' automatic context collection and proposes Cross-Origin Context Poisoning (XOXO) attacks: poisoning shared codebases via semantics-preserving code transformations (e.g., variable renaming) causes assistants like GitHub Copilot to unknowingly generate vulnerable code, achieving 73.20% average ASR across 8 SOTA models.