ACL2026 Time Series AI paper notes paper summaries Time-Series Forecasting Reasoning Question Answering Reinforcement Learning LLM

📈 Time Series¶

💬 ACL2026 · 8 paper notes

📌 Same area in other venues: 📷 CVPR2026 (7) · 🔬 ICLR2026 (121) · 🧪 ICML2026 (45) · 🤖 AAAI2026 (31) · 🧠 NeurIPS2025 (54) · 📹 ICCV2025 (4)

🔥 Top topics: Time-Series Forecasting ×5 · Reasoning ×3

A Unified Framework for Modeling Heterogeneous Financial Data via Dual-Granularity Prompting: The FinLangNet framework is proposed, utilizing a dual-module architecture (DeepFM for static features and a Transformer with a dual-granularity prompting mechanism for temporal behavior) to achieve multi-scale credit risk prediction. Its deployment on the Didi Finance platform resulted in a 6.3pp increase in KS and a 9.9% reduction in the bad debt rate.
ODTQA-FoRe: An Open-Domain Tabular Question Answering Dataset for Future Data Forecasting and Reasoning: ODTQA-FoRe introduces an open-domain tabular question answering task focused on future numerical forecasting and post-forecast reasoning. It provides the TimeFore three-agent framework, which chains table retrieval, SQL data acquisition, specialized time-series forecasting, and answer normalization into an evaluable baseline.
STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation: This paper proposes STK-Adapter, which embeds three MoE modules in each layer of a Large Language Model (LLM)—ST-MoE for capturing spatio-temporal structures, EA-MoE for modeling event chain semantics, and CMA-MoE for deep cross-modal alignment. It addresses the issues of spatio-temporal information loss and layer-wise dilution caused by shallow alignment between TKG embeddings and LLMs, significantly outperforming SOTA on four benchmark datasets.
STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning: STReasoner utilizes Network SDEs to synthesize spatio-temporal time series data with graph structures and textual semantics. By integrating a time-series encoder, a three-stage training pipeline, and a spatial-aware S-GRPO, the model learns to perform explicit reasoning based on temporal dynamics and spatial dependencies.
Temporal Leakage in Search-Engine Date-Filtered Web Retrieval: A Retrospective Forecasting Case Study: This paper systematically audits the date filters of Google and DuckDuckGo, finding that search engine date filtering fails significantly in retrospective forecasting (RF) evaluations—\(71\%\) (Google) and \(81\%\) (DuckDuckGo) of questions contain at least one page with major post-cutoff information leakage, causing prediction Brier scores to artificially drop from \(0.24\) to \(0.10\).
Test of Time: Rethinking Temporal Signal of Benchmark Contamination: This paper demonstrates that "performance decay after cutoff" is not robust evidence of benchmark contamination: as long as the same set of source documents is converted from original fill-in-the-blank questions to LLM-rephrased questions, the temporal decay signal changes significantly or even disappears.
Time-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback: Defining the new Time-RA task, this work upgrades time series anomaly detection from binary classification to generative reasoning diagnosis (detection + classification + root cause explanation). It constructs RATs40K, the first multimodal benchmark comprising ~40,000 samples across 10 domains and 20 anomaly types, validating the feasibility of this paradigm through an AI feedback labeling pipeline and LLM fine-tuning.
TSAQA: Time Series Analysis Question And Answering Benchmark: TSAQA is a unified time series question answering benchmark: it casts 6 types of temporal analysis tasks (anomaly detection, classification, representation, comparison, data transformation, and temporal relations) into 3 closed-form question types (true/false TF, multiple-choice MC, and the newly proposed puzzling PZ). Across 13 domains with 210k samples, LLMs and time series foundation models are evaluated under a unified zero-shot protocol—results indicate that even the strongest commercial model, Gemini-2.5-Flash, achieves an average accuracy of only 65.08%, leaving significant room for improvement.