TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale¶
Conference: ACL 2026 arXiv: 2604.21889 Code: None Area: LLM Evaluation Keywords: risk event discovery, customer complaint mining, incident linking, stream processing, SNR optimization
TL;DR¶
TingIS is an end-to-end risk event discovery system deployed on a fintech platform. It employs a five-module architecture—semantic distillation, cascaded routing, event linking engine, state management, and multi-dimensional denoising—to extract actionable risk events from massive noisy customer complaints in real time, achieving a P90 alert latency of 3.5 minutes and a 95% high-priority event discovery rate.
Background & Motivation¶
Background: Large-scale online platforms rely on complex microservices and cloud-native architectures, where even minor failures can rapidly propagate into large-scale incidents. Internal observability systems (metrics/logs/traces) serve as the first line of defense, yet they are not infallible.
Limitations of Prior Work: Customer complaint data represents a critical signal for uncovering monitoring blind spots, but poses extreme challenges due to high noise, high throughput, and semantic complexity. Detecting a systemic failure from only 3 complaints out of a 2,000-per-minute stream presents a severe signal-to-noise ratio (SNR) problem. Low-SNR systems generate excessive false positives, leading to alert fatigue in operations teams.
Key Challenge: A fundamental tension exists among achieving high noise tolerance, minimal latency, and near-zero missed detections simultaneously—satisfying real-time requirements (minute-level), high discovery rate (>95%), and low false-positive rate all at once.
Goal: To build an enterprise-scale system capable of discovering risk events in real time from 300,000+ customer complaints per day.
Key Insight: The problem is decomposed into five orthogonal modules, each addressing a core sub-problem. A hybrid intelligence strategy of "lightweight rule pre-filtering + LLM deep judgment" balances precision and cost.
Core Idea: A multi-stage event linking engine (LSH high-speed clustering → LLM purity checking → cross-batch historical association → time-decay weighted scoring) achieves semantic convergence and identity persistence, complemented by cascaded routing and multi-dimensional denoising to maintain overall system SNR.
Method¶
Overall Architecture¶
TingIS consists of three layers (data observation, semantic engine, and long-term memory) and five orthogonal modules: M1 Semantic Distillation, M2 Cascaded Routing, M3 Event Linking Engine, M4 Event State Management, and M5 Multi-dimensional Denoising. The design follows three core principles: semantic convergence with identity persistence, hybrid intelligence collaboration, and multi-constraint SNR balancing.
Key Designs¶
1. Semantic Distillation Module (M1)
- Function: Transforms raw customer complaint text into unambiguous semantic units.
- Mechanism: Qwen3-8B generates initial summaries in a "subject + problem" format (e.g., "credit card online payment + discount error"), filtering emotional expressions, PII, and irrelevant details; BGE-M3 then converts these summaries into vectors.
- Design Motivation: Raw complaint text is noisy, colloquial, and highly diverse. High-density semantic representations must be created at controllable computational cost.
2. Cascaded Routing Module (M2)
- Function: Accurately attributes complaints to the corresponding business domain (
biz_code). - Mechanism: A two-stage strategy—keyword matching (high precision, for explicit complaints) → vector retrieval + BGE-Reranker re-ranking (high recall, for ambiguous complaints). The reranker operates within a Top-10 candidate pool to satisfy latency constraints.
- Design Motivation: Semantic differences across business domains are large; accurate routing is a prerequisite for subsequent event discovery.
3. Multi-stage Event Linking Engine (M3)
- Function: Determines whether multiple complaints point to the same underlying risk event.
- Mechanism: Two steps—(a) Intra-batch efficient aggregation: partition by
biz_code→ LSH high-speed clustering → LLM purity checking (impure clusters are split and titled); (b) Cross-batch historical association: match against historical events using time-decay weighted semantic similarity \(s^* = s \cdot e^{-k\Delta t}\), with LLM as the final arbiter for merge/create decisions when the score exceeds the threshold. - Design Motivation: This is the system's core challenge—accurately determining "event identity" across complaints from different times and phrasings. LSH ensures efficiency; LLM ensures precision; time decay prevents historical inertia from incorrectly absorbing new complaints.
Key Experimental Results¶
Main Results (One Month of Online Deployment)¶
| Metric | Value |
|---|---|
| Daily complaints processed | 300,000+ |
| Peak throughput | 2,000 complaints/min |
| P90 alert latency | 3.5 minutes |
| High-priority event discovery rate | 95% |
Ablation Study¶
| Evaluation Dimension | Baseline | TingIS |
|---|---|---|
| Routing accuracy | Baseline method | Significantly outperforms baseline |
| Clustering quality | Baseline method | Significantly outperforms baseline |
| Signal-to-noise ratio (SNR) | Baseline method | Significantly outperforms baseline |
Key Findings¶
- Alert triggered from as few as 3 complaints: The system can detect a potential risk event from only 3 correlated complaints, which is critical for early warning.
- Hybrid intelligence strategy effectively reduces LLM cost: Rule-based pre-filtering substantially reduces input volume; LSH and similarity thresholds gate expensive LLM calls; persistent historical state yields incremental efficiency gains.
- Alert penetration mechanism balances fatigue prevention and emergency response: A 2-hour silence period prevents alert fatigue under normal conditions, but the silence window is automatically bypassed when an explosive growth pattern is detected.
- Modular design supports low-cost maintenance: The five orthogonal modules can be upgraded independently (e.g., replacing a stronger LLM or a faster embedding model).
Highlights & Insights¶
- Complete industrial-grade system design: This goes beyond algorithmic innovation, presenting a full engineering solution spanning data observation → semantic processing → event management → denoising → alerting.
- Elegantly designed three-layer data model: The state layer (real-time decision-making), audit layer (immutable evidence chain), and snapshot layer (historical baseline) decouple data requirements across different dimensions.
- Time-decay semantic association mechanism: \(s^* = s \cdot e^{-k\Delta t}\) concisely fuses semantic similarity with temporal proximity, preventing old events from incorrectly absorbing new complaints.
- Hybrid intelligence paradigm worth emulating: The progressive "rule → retrieval → LLM" invocation strategy maintains precision while controlling computational cost.
Limitations & Future Work¶
- Insufficient presentation of experimental details: The HTML version of the paper is truncated in the experimental section; detailed offline benchmark data is not fully presented.
- Strong domain specificity: The system is designed for fintech customer complaints; transferring it to other domains requires adapting the routing knowledge base and denoising strategies.
- LLM dependency: Core purity checking and merge arbitration rely on an LLM (Kimi-K2), which may face latency variability and cost concerns.
- Cold-start problem: System performance may degrade for new business domains that lack historical events and keyword knowledge bases.
- Multilingual scenarios not discussed: Although complaints may span multiple languages, the paper does not address multilingual support.
Related Work & Insights¶
- BGE-M3 (2024): The embedding and reranker model adopted in this work, providing high-quality multilingual semantic representations.
- Qwen3-8B (2025): The LLM used for semantic distillation, balancing output quality and inference cost.
- Kimi-K2 (2025): Used for purity checking and merge arbitration in the event linking engine, providing high-quality reasoning.
- LSH (Locality-Sensitive Hashing): A classical approximate nearest-neighbor algorithm, used here for high-speed pre-clustering.
Rating¶
- Novelty: ⭐⭐⭐ — The system design integrates multiple existing techniques; innovation is reflected more in engineering integration and module synergy than in algorithmic novelty.
- Experimental Thoroughness: ⭐⭐⭐ — Online deployment data demonstrates system viability, but offline comparison experiment details are insufficiently complete.
- Writing Quality: ⭐⭐⭐⭐ — System architecture is described clearly, inter-module relationships are articulated precisely, and real-world cases enhance readability.
- Value: ⭐⭐⭐⭐ — Offers strong reference value for building enterprise-grade LLM application systems, demonstrating engineering practice for LLMs in real-time stream processing scenarios.
Highlights & Insights¶
To be supplemented after a thorough reading of the paper.
Limitations & Future Work¶
To be supplemented after a thorough reading of the paper.
Related Work & Insights¶
To be supplemented after a thorough reading of the paper.
Rating¶
- Novelty: Pending
- Experimental Thoroughness: Pending
- Writing Quality: Pending
- Value: Pending