The question of how differences within groups compare, rather than simply comparing the groups themselves, presents a significant challenge across many scientific disciplines. Johan F. Hoorn from The Hong Kong Polytechnic University, along with colleagues, addresses this problem by introducing a novel statistical measure, the correlation-of-divergency coefficient, termed c-delta. This new approach moves beyond traditional correlation methods by quantifying whether the pattern of internal divergence within one group mirrors that of another, offering a unique way to assess similarity based on how values differ. The team demonstrates that c-delta has broad potential applications, ranging from fundamental physics and genetics to complex systems such as social networks and manufacturing processes, and provides a powerful new tool for benchmarking, clustering and understanding variability structures.
Divergence Pattern Similarity Using Correlation of Divergency
This research introduces a new statistical measure, the Correlation of Divergency (cδ), designed to quantify the similarity of internal divergence patterns between two datasets. This method focuses on how spread out values are within each dataset, assessing if the patterns of deviation from the mean are similar, unlike traditional correlation which measures association between paired values. The measure can be calculated using either squared differences or absolute differences between data points, offering flexibility for various applications. Scientists envision a wide range of uses for cδ, including comparing quantum states in physics, assessing gene expression in genetics, quantifying variance structures in machine learning features, comparing network structures in social network analysis, comparing test scores in psychometrics, evaluating variability in manufacturing quality control, and comparing traits between species in evolutionary biology.
It also holds promise for validating clusters in data analysis. This novel approach addresses a gap in statistical methodology by focusing on divergence patterns and offers versatility across numerous fields. However, cδ differs from traditional measures like Pearson’s r, as it does not have a fixed range, making interpretation more complex. Researchers propose rescaling to address this, but acknowledge this introduces sample dependence. The measure is also sensitive to outliers due to the use of squared differences and becomes undefined if one dataset has no internal variation.
Furthermore, cδ cannot differentiate between datasets with similar divergence patterns and those with perfectly opposed patterns. Currently, there is no established framework for statistical inference, such as hypothesis testing or confidence intervals. The mathematical formulation requires further clarification, and limited empirical validation exists. There is also potential overlap with existing measures like the Gini mean difference. Future work will focus on addressing outlier sensitivity, developing a standardized scale, creating a framework for statistical inference, clarifying the mathematical formulation, and conducting more extensive testing on diverse datasets.
Researchers also aim to adapt the measure for complex data, such as quantum data. In essence, this research presents a promising but developing statistical measure. While the concept of focusing on divergence patterns is novel, significant limitations need to be addressed before widespread adoption. This work serves as a thorough exploration of the measure’s potential and a clear roadmap for future research.
Divergence Pattern Similarity, a New Coefficient
This research introduces the correlation-of-divergency coefficient, c-delta, a new statistical measure designed to quantify the similarity of internal divergence patterns between two groups of values. Unlike traditional correlation coefficients that assess direct associations between paired values, c-delta focuses on comparing how values differ within each group, revealing similarities in their variability structures. Researchers calculate, for each value, its divergence from all others in its group, then compare these patterns across the two groups being analysed. The development of c-delta addresses a gap in existing statistical methodology by providing a tool specifically designed to compare divergence patterns, with potential applications spanning diverse fields including physics, genetics, and social network analysis.
Researchers can utilise this coefficient for benchmarking, clustering validation, and assessing the similarity of variability structures across diverse fields. Experiments demonstrate that a high c-delta value indicates a strong similarity in divergence patterns, meaning that when a value in one group is distant from others, the corresponding value in the second group exhibits a similar distance. Conversely, a low value suggests dissimilar or unrelated divergence patterns. Researchers also explored an alternative formulation using absolute differences, known as the Gini mean difference, which may be preferred in certain applications.
This formulation replaces squared differences and square roots with absolute values, offering a different approach to quantifying divergence. Importantly, the c-delta coefficient is not bounded between -1 and 1, unlike Pearson and Spearman correlations, and its interpretation differs accordingly. The magnitude of c-delta is indicative of the similarity in divergence patterns, with larger values signifying stronger relationships, irrespective of the research units involved. This new measure differs fundamentally from traditional correlation methods, which focus on the association between paired observations, while c-delta focuses on the relationships between patterns of internal variability. For example, tests show that a completely inverse relationship in divergence patterns will not be recognised by c-delta as ‘different’, because the pattern of divergence in both sets will be the same, although coming from different sources. This makes c-delta particularly suitable for comparing the structure of dispersion or variability between two datasets, rather than their direct association.
Divergence Pattern Similarity Quantified by c-delta
Scientists developed a new statistical measure, termed the correlation-of-divergency coefficient (c-delta), to quantify the similarity of internal divergence patterns between two groups of values. This work introduces a method that, unlike traditional correlation coefficients, assesses whether the way values differ within one group is mirrored in another, rather than focusing on paired values directly. Researchers calculate, for each value, its divergence from all other values within its group, then compare these patterns across the two groups. To achieve this, researchers calculate the squared differences between each value and all others in its group, summing these to quantify divergence, and then comparing the resulting patterns.
The c-delta coefficient is normalised by the average root mean square divergence within each group, ensuring the measure is scale-invariant and comparable across different datasets. This normalization process allows for meaningful comparisons even when the datasets have different ranges or units. Experiments demonstrate that a high c-delta value indicates a strong similarity in divergence patterns, meaning that when a value in one group is distant from others, the corresponding value in the second group exhibits a similar distance. Conversely, a low value suggests dissimilar or unrelated divergence patterns. Researchers also explored an alternative formulation using absolute differences, known as the Gini mean difference, which may be preferred in certain applications. This formulation replaces squared differences and square roots with absolute values, offering a different approach to quantifying divergence.
👉 More information
🗞 Correlation of divergency: c-delta. Being different in a similar way or not
🧠 ArXiv: https://arxiv.org/abs/2510.16717
