Skip to content

🛡️ AI Safety

💬 ACL2026 · 3 paper notes

When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation

This paper constructs FairNews, the first multi-document news summarization dataset with political leaning labels, and evaluates 13 LLMs through a five-dimensional fairness evaluation framework, finding that mid-sized models outperform larger ones in fairness and efficiency, and that entity sentiment similarity is the most resistant dimension to prompt-based debiasing.

XLSR-MamBo: Scaling the Hybrid Mamba-Attention Backbone for Audio Deepfake Detection

This paper proposes the XLSR-MamBo framework, systematically exploring four topology designs and multiple SSM variants (Mamba2, Hydra, GDN) for Mamba-Attention hybrid architectures in audio deepfake detection, where MamBo-3-Hydra achieves competitive performance across multiple benchmarks through Hydra's native bidirectional modeling, and increasing backbone depth effectively mitigates shallow model instability.

XQ-MEval: A Dataset with Cross-lingual Parallel Quality for Benchmarking Translation Metrics

This paper constructs XQ-MEval, the first translation evaluation benchmark with cross-lingual parallel quality, using semi-automated MQM error injection to generate pseudo-translations with controllable quality, empirically revealing cross-lingual scoring bias in automatic evaluation metrics for the first time, and proposing an LGN normalization strategy that effectively calibrates multilingual metric evaluation.