Scientists investigate fundamental limits on quantifying the difference between probability distributions, presenting a generalized Pinsker inequality for Bregman divergences derived from negative Tsallis entropies. Guglielmo Beretta (Università Ca’ Foscari Venezia and Politecnico di Torino), Tommaso Cesari (University of Ottawa), and Roberto Colomboni (Politecnico di Milano and Università degli Studi di Milano) et al. demonstrate a novel bound relating these divergences to total variation, extending the classical Pinsker inequality. This research is significant because it provides a crucial tool for analysing probabilistic prediction with Tsallis losses and advancing online learning algorithms, offering improved control and performance in statistical inference.
Optimal bounds relating Bregman divergence and total variation distance using Tsallis entropies
Researchers have established a generalized Pinsker inequality for Bregman divergences derived from negative α-Tsallis entropies, also known as β-divergences. Motivated by applications in probabilistic prediction using Tsallis losses and online learning algorithms, this work provides a fundamental link between excess risk and distributional closeness.
Specifically, the study rigorously demonstrates that for any probability distributions p and q within the probability simplex, the Bregman divergence Dα(p∥q) is bounded below by a constant multiplied by the square of the total variation distance between the two distributions: Dα(p∥q) ≥ Cα,K 2 · ∥p −q∥2 1. This breakthrough lies in the explicit determination of the optimal constant Cα,K for every possible choice of parameters α and K.
The researchers employed a novel variational approach, characterizing Cα,K through the Hessian of the α-Tsallis entropy and reducing the constant’s computation to a parametric quadratic form optimization. This methodology not only recovers the classic Pinsker inequality for the Shannon entropy when α equals 1, but also reveals distinct regimes in the constant’s behaviour as α and K vary, as detailed in Table 1.
The implications of this generalized inequality extend to several areas of applied mathematics and machine learning. By converting bounds on excess risk into measures of total variation, the research facilitates improved analysis of predictive distributions and plug-in rules. Furthermore, the findings provide crucial insights into the strong convexity of Tsallis entropies, which is essential for understanding the performance of online learning algorithms and convex optimization methods. The work’s results are particularly relevant to applications involving robust inference, signal processing, and data analysis where Tsallis losses and β-divergences are commonly employed.
Establishing Pinsker bounds via α-Tsallis entropy Hessian optimisation
Researchers established a generalized Pinsker inequality for Bregman divergences generated by negative α-Tsallis entropies, also known as α-divergences. The study focused on bounding the Kullback, Leibler divergence in terms of total variation, providing a method to convert control into -control within probabilistic prediction.
Specifically, the work proves that for any probability distribution and any, the inequality holds within the relative interior of the probability simplex. To determine the optimal constant for this bound, the researchers employed a variational characterization, linking it to the Hessian of the α-Tsallis entropy.
This approach reduced the computation of the sharp constant to optimizing a parametric quadratic form over tangent ∥·∥1-unit directions. The methodology recovers the classical Pinsker inequality for α = 1, demonstrating consistency with established information theory. The investigation meticulously details the relationship between excess risk and total variation distance, utilizing the Bayes risk associated with the Tsallis loss function.
By establishing a Pinsker-type inequality for the Bregman divergence of the negative α-Tsallis entropy, the study converts excess-risk bounds into interpretable measures of predictive distribution control. This conversion facilitates excess-risk bounds for the 0, 1 loss of corresponding plug-in rules, leveraging the standard intermediate inequality.
The research further demonstrates the applicability of Tsallis entropies as regularizers in online learning and multi-armed bandit problems, highlighting their role in inducing specific geometries and data-dependent behaviours. A key innovation lies in the explicit determination of the constant Cα,K, which varies depending on the values of α and K, reflecting changes in the geometry of the Bregman divergence.
Table 1 summarizes these constants, showcasing dimension-free results and polynomial dependencies on K for specific α regimes. The study provides a comprehensive analysis of the constant’s behaviour, including a phase change at α = 3 and a demonstration of zero value when α exceeds 2 and K is greater than or equal to 3.
Pinsker inequalities for α-Tsallis divergences and their relation to β-divergences
Researchers established a generalized Pinsker inequality for Bregman divergences generated by negative α-Tsallis entropies, also known as α-divergences. For any α, the study proves the bound Dα(p∥q) ≤ ∥p − q∥1, where p and q reside in the relative interior of the probability simplex. Explicit optimal constants for every choice of α were determined, revealing a correction term that diminishes as K approaches infinity.
Specifically, for a two-class problem, a Pinsker-type inequality holds for all α greater than 2, maintaining the excess-risk-to-∥·∥1 conversion crucial for binary classification. The work demonstrates that for K equal to 2, the Bregman divergences generated by negative α-Tsallis entropies coincide with β-divergences where β equals α.
Calculations confirm that D1(p∥q) is equivalent to the Kullback-Leibler divergence DKL(p∥q) when p and q are within the relative interior of the K-dimensional probability simplex. Analysis of the constant Cα,K, which represents the sharp Pinsker constant, reveals its behaviour across different values of α and K.
For K equal to 2, the constant Cα,K remains consistent at 1 across the tested range of α values. However, for K equal to 3, Cα,K exhibits a decreasing trend with increasing α, starting at approximately 2.1 when α is 0.5 and decreasing to around 1.2 when α reaches 4.5. These findings provide a detailed understanding of the relationship between Tsallis entropies, Bregman divergences, and Pinsker-type inequalities, with implications for probabilistic prediction and online learning algorithms.
Optimal Bregman divergence constants and implications for learning theory
A sharp Pinsker-type inequality has been established for Bregman divergences induced by negative Tsallis entropies, with an explicitly determined optimal constant for every combination of parameters. This inequality provides a precise relationship between the Kullback, Leibler divergence, total variation, and a generalization of the Pinsker inequality to Bregman divergences generated by negative Tsallis entropies, also known as -divergences.
The research details how this constant behaves, revealing phase transitions including a breakdown for values of α greater than two and specific effects related to dimension and parity when α is between one and two. The findings have direct implications for learning theory, offering a tight method to convert control over the excess risk of Tsallis losses into total-variation control across predictive distributions.
Furthermore, the work yields a principled approach to derive multiclass 0, 1 classification regret bounds from Tsallis surrogate performance, clarifying when this conversion is independent of dimensionality and when it may degrade with increasing K. In the context of online learning, the results identify the optimal strong convexity of Tsallis regularizers, refining constants used in Follow-the-Regularized-Leader and Mirror-Descent analyses and demonstrating how the choice of α influences the underlying algorithm geometry.
The authors acknowledge limitations related to the specific parameter ranges and the focus on Bregman divergences generated by negative Tsallis entropies. Future research could explore the extension of these findings to other divergence measures and investigate the practical performance of algorithms leveraging these theoretical results in various machine learning applications. The established inequality serves as a foundational element for further investigation into the interplay between information theory and learning algorithms.
👉 More information
🗞 Generalized Pinsker Inequality for Bregman Divergences of Negative Tsallis Entropies
🧠 ArXiv: https://arxiv.org/abs/2602.05744
