Online Prediction of Stochastic Sequences with High Probability Regret Bounds¶

Conference: ICLR 2026 arXiv: 2602.16236 Code: None Area: Reinforcement Learning / Online Learning Theory Keywords: Online prediction, stochastic sequences, high-probability regret bounds, universal prediction, countable alphabet

TL;DR¶

This paper revisits the classical problem of universal prediction of stochastic sequences over a finite horizon \(T\), and establishes, for the first time, vanishing regret bounds that hold with high probability in the form \(O(T^{-1/2}\delta^{-1/2})\). These bounds closely mirror the existing expected regret bound of \(O(T^{-1/2})\), and the paper further proves that the exponent of \(\delta\) cannot be improved without additional assumptions.

Background & Motivation¶

Universal prediction is a classical problem in information theory and online learning: given a sequence generated by an unknown stochastic process, the learner aims to predict each next observation optimally, minimizing cumulative loss (regret) relative to a oracle strategy that knows the true distribution.

Existing literature primarily presents regret bounds for universal prediction in expectation. For stochastic processes over countable alphabets, classical results yield an expected regret convergence rate of \(O(T^{-1/2})\). However, expected bounds have fundamental limitations:

Uncontrolled tail risk: An expected bound only guarantees good average performance, but does not preclude arbitrarily large regret on certain (potentially non-negligible) sample paths.

Unsuitability for risk-sensitive settings: In safety-critical or financial risk-management applications, one requires stronger guarantees of the form "except with negligibly small probability, regret never exceeds a given threshold."

Theoretical gap: Whether expected bounds can be lifted to high-probability bounds without significantly sacrificing convergence rates had remained an open question.

This paper answers that question affirmatively: such a lift is possible, yielding a high-probability bound that closely matches the expected bound at the cost of an additional factor of \(\delta^{-1/2}\).

Method¶

Overall Architecture¶

The paper considers the following online prediction setting: - Alphabet: A countable set \(\mathcal{X}\) (possibly countably infinite) - Horizon: Finite \(T\), known to the learner - Stochastic process: The sequence \(X_1, X_2, \ldots, X_T\) is generated by an unknown stationary ergodic process \(P\) - Objective: Design a predictor \(\hat{q}_t(\cdot | X^{t-1})\) that minimizes the cumulative KL divergence (regret) relative to the true conditional distribution \(p_t(\cdot | X^{t-1})\) - Regret definition: \(R_T = \sum_{t=1}^T D_{\text{KL}}(p_t \| \hat{q}_t)\), or equivalently in terms of log-loss

Key Designs¶

Lifting from expected to high-probability bounds:
- Known expected bound: \(E[R_T] = O(\sqrt{T})\), i.e., \(E[R_T / T] = O(T^{-1/2})\)
- This paper's high-probability bound: \(P(R_T / T \geq \epsilon) \leq \delta\), where \(\epsilon = O(T^{-1/2} \delta^{-1/2})\)
- Equivalently, with probability at least \(1 - \delta\), the regret rate does not exceed \(O(T^{-1/2}\delta^{-1/2})\)
Technical approach:
- A naive application of Markov's inequality to the expected bound yields only \(O(T^{-1/2}\delta^{-1})\), with \(\delta\)-exponent equal to \(-1\)
- This paper achieves the improved exponent of \(-1/2\) via a more refined analysis, likely exploiting mixing properties of the stochastic process, sharper concentration inequalities, or Azuma–Hoeffding-type martingale arguments
- Although seemingly incremental, this improvement is theoretically significant
Impossibility result:
- The paper proves that without additional assumptions (e.g., stronger mixing conditions or exponential moment conditions), the \(\delta^{-1/2}\) exponent is optimal and cannot be further improved
- The lower bound is established via construction of adversarial instances
- Together, the upper and lower bounds provide a tight and complete characterization of the dependence on \(\delta\)
Handling countable alphabets:
- Universal prediction over countable alphabets is more challenging than over finite alphabets, due to a richer model class
- Uniform convergence over infinite-dimensional distribution spaces must be carefully handled
- The results in this paper apply to this more general setting

Loss & Training¶

This is a purely theoretical work with no training procedure. The key mathematical tools include: - Log-loss as the standard prediction loss function - KL divergence as the measure of regret - Martingale concentration inequalities for high-probability analysis - Information-theoretic tools (mutual information, conditional entropy, etc.) for stochastic process prediction

Key Experimental Results¶

Main Results¶

This paper makes purely theoretical contributions; the main results are stated as theorems and propositions:

Result Type	Bound Form	Probability Guarantee	Alphabet	Remarks
Prior expected bound	\(E[R_T/T] = O(T^{-1/2})\)	Expectation	Countable	Known from literature
This paper's high-prob. bound	\(R_T/T \leq O(T^{-1/2}\delta^{-1/2})\)	\(\geq 1-\delta\)	Countable	New result
Markov inequality conversion	\(R_T/T \leq O(T^{-1/2}\delta^{-1})\)	\(\geq 1-\delta\)	—	Trivial conversion
This paper's lower bound	\(\delta^{-1/2}\) exponent is unimprovable	—	—	Matches upper bound

Ablation Study¶

Method / Condition	\(\delta\) Exponent	Optimal?	Remarks
Expected bound + Markov inequality	\(-1\)	No	Overly coarse conversion
This paper's high-prob. analysis	\(-1/2\)	Yes	Improved via refined analysis
Strengthened assumptions (e.g., strong mixing)	Potentially better	Condition-dependent	Requires additional assumptions

Key Findings¶

The cost of lifting to high probability is a square root: Moving from \(O(T^{-1/2})\) to \(O(T^{-1/2}\delta^{-1/2})\) incurs an additional \(\delta^{-1/2}\) factor that is both necessary and sufficient.
Root cause of the impossibility: The inherent variability of the stochastic process limits the degree of high-probability concentration, in contrast to the stronger concentration phenomena observed in the i.i.d. setting.
Comparison with finite-alphabet results: The results for countable alphabets match those for finite alphabets in form, though the proofs require more sophisticated techniques.

Highlights & Insights¶

Elegant main theorem: The bound \(O(T^{-1/2}\delta^{-1/2})\) closely mirrors the expected bound \(O(T^{-1/2})\); the additional \(\delta^{-1/2}\) factor elegantly quantifies the "price of a probabilistic guarantee."
Matching upper and lower bounds: The paper not only establishes a high-probability bound but also proves its optimality via an impossibility result — a rare and valuable combination in theoretical research.
Revisiting a classical problem: Finding an open and natural question within a well-studied classical problem and resolving it completely demonstrates considerable theoretical depth.
Bridging two research traditions: The work organically connects universal prediction from information theory with high-probability analysis from online learning.

Limitations & Future Work¶

Knowledge of \(T\) required: The learner must know the time horizon \(T\) in advance; extensions to the anytime setting are worth exploring.
Stationarity assumption: Generalization to non-stationary or adversarial environments is an important open direction.
Computational efficiency: Whether the theoretical results correspond to computationally efficient prediction algorithms warrants further investigation.
More general loss functions: The paper focuses on log-loss/KL divergence; extensions to other losses (e.g., squared loss, absolute loss) remain to be studied.
Finite-sample constants: The constants hidden in \(O(\cdot)\) notation and their practical implications deserve further analysis.

Rissanen's stochastic complexity theory: The deep connection between universal prediction and data compression; the regret bounds in this paper can also be interpreted as bounds on codelength.
Concentration in online learning: High-probability analysis techniques in online learning (e.g., PAC-Bayes bounds, martingale methods) provide key technical tools.
Cesà-Bianchi & Lugosi's online convex optimization: This paper transfers high-probability analysis techniques from that framework to the universal prediction problem.
The impossibility proof techniques developed here may inspire lower bound arguments in other online learning problems.

Rating¶

Novelty: ⭐⭐⭐⭐ — A new answer to a classical problem; technically non-trivial, though conceptual novelty is moderate
Experimental Thoroughness: ⭐⭐⭐ — Purely theoretical; no empirical validation, but the theorems themselves constitute the core contribution
Writing Quality: ⭐⭐⭐⭐ — Clear exposition of theoretical content demands a high level of craft
Value: ⭐⭐⭐⭐ — Closes a theoretical gap; the matching of upper and lower bounds achieves mathematical completeness