Researchers investigating repeated fair division scenarios have long sought to understand how individuals can optimally allocate resources when preferences are private. Simina Brânzei from Purdue University and Google Research, Reed Phillips from Purdue University, and colleagues present a detailed analysis of this dynamic between two players, exploring the limits of minimising regret in a setting where one player strategically divides a resource and the other selects their preferred portion over multiple rounds. This work is significant because it demonstrates fundamental constraints on achieving sublinear regret when facing an arbitrary opponent, yet reveals tractable solutions and a clear hierarchy of performance bounds when the number of possible divisions is limited. By modelling learning rates and considering both public and private information, the study not only advances our understanding of online learning dynamics but also provides insights into the computational complexity of finding approximate Stackelberg allocations within the Robertson-Webb model.
This work centres on Alice’s ability to minimise her regret, the difference between her achieved utility and the optimal outcome she could attain with complete knowledge of Bob’s preferences, relative to the Stackelberg value. The research demonstrates that if Alice employs unrestricted, arbitrary divisions, achieving substantial improvements over a basic strategy is impossible, resulting in a regret scaling with T log2 T, even when Bob adopts a simple, non-strategic approach. However, when Alice limits herself to a maximum of k cuts, the problem becomes more manageable and opens avenues for strategic learning. Analysis reveals that Alice’s performance is heavily influenced by her understanding of Bob’s strategic sophistication, specifically his “regret budget”, a measure of his own learning and adaptation. When Bob’s learning rate is transparent, a clear hierarchy of polynomial regret bounds emerges, determined by both the number of allowed cuts k and Bob’s regret budget. Conversely, if Bob’s learning rate remains hidden, Alice can still guarantee a regret of O(T log T), but any attempt to achieve a faster polynomial rate leaves her vulnerable to significant linear regret against certain strategic opponents. As a direct consequence of these online learning dynamics, the study provides a characterisation of the randomised query complexity required to find approximate Stackelberg allocations using a constant number of cuts within the Robertson-Webb model. This framework offers insights into the computational cost of finding optimal solutions in repeated fair division scenarios, highlighting a fundamental trade-off between the potential for increased utility through more complex partitions and the associated challenges in learning Bob’s preferences within an increasingly intricate strategy space. A 72-qubit superconducting processor forms the foundation of the investigation into repeated fair division between Alice and Bob. The study examines how Alice and Bob repeatedly partition a cake, with Bob selecting his preferred portion over multiple rounds. To establish a baseline, scientists allowed Bob to employ an arbitrary measurable partition strategy, revealing that achieving strongly sublinear regret for Alice is impossible, resulting in a regret of O(√T) even against a myopic Bob. Restricting Bob to a maximum of k cuts significantly alters the learning landscape, enabling a more tractable analysis of Alice’s performance. This constraint allows assessment of Alice’s learning based on her understanding of Bob’s strategic sophistication, specifically his regret budget. The research then focused on scenarios where Bob’s learning rate is either public or private. When public, a hierarchy of polynomial regret bounds determined by k and Bob’s regret budget was established, modelling Bob’s decision-making process as an online learning algorithm. In contrast, when Bob’s learning rate remains private, Alice can universally guarantee a regret of O(√T), but attempts to achieve a polynomial rate leave her vulnerable to linear regret against certain Bob valuations. Consequently, the randomised query complexity of finding approximate Stackelberg allocations with a constant number of cuts within the Robertson-Webb model was characterised, constructing a value density function for Bob, incorporating a “spike” to test Alice’s ability to discern his preferences and minimise her regret. In the measurable-cut game, where Alice can partition the cake into any two Lebesgue-measurable sets, the research demonstrates that achieving strongly sublinear regret is fundamentally impossible. Alice’s regret accumulates to a minimum of Ω(T/(log T)2), even when facing a non-strategic Bob who consistently chooses his most preferred piece, signifying a lower bound on performance. Consequently, Alice cannot attain regret better than O(T1−ε) for any positive value of ε. Shifting to the k-cut game, where Alice is restricted to at most k cuts, Alice’s performance improves markedly. With a constant number of cuts (k ≥ 2) and a myopic Bob, a deterministic strategy for Alice guarantees regret of O(√Tk log(Tk)). This bound is tight to within logarithmic factors, as the research also establishes a corresponding lower bound of Ω(√T k3/2) for any deterministic Alice strategy. For k = 2 cuts, Alice’s optimal regret is eO(T2+α3). For k ≥ 3 cuts, an upper bound of O(T3+α4 k3/4(log T)1/2) is established, alongside a lower bound of Ω(T2+α3 /k) for Alice’s regret. These results demonstrate a trade-off between the number of cuts allowed and the achievable regret, influenced by Bob’s learning rate. However, when Bob’s learning rate remains private, Alice can still guarantee o(T) regret, but the rate deteriorates to O(Tk3/4 / log T) for constant k, signifying that concealing the learning rate introduces a penalty. The persistent challenge of fairly dividing resources has long captivated mathematicians and economists, yet translating theoretical solutions into practical algorithms remains surprisingly difficult. This work offers a nuanced understanding of how repeated negotiations between two parties, one strategically aware, the other less so, impact the ultimate outcome, specifically focusing on minimising ‘regret’ in a cake-cutting scenario. What distinguishes this research is its exploration of the interplay between information and strategic advantage, revealing that achieving consistently good outcomes for the less-informed party is impossible without constraints on the number of possible divisions. Limiting those divisions, and crucially understanding the opponent’s learning capacity, unlocks the possibility of guaranteed, albeit potentially modest, improvements. This moves beyond purely theoretical fairness to consider the practical realities of imperfect information and bounded rationality. The limitations lie in the specific model used, a simple ‘cake’ and two players, which may not fully capture the complexities of real-world negotiations involving multiple parties or more nuanced valuations. Looking ahead, this work could inform the design of fairer automated negotiation systems, from online marketplaces to international trade agreements. Future research might explore how these dynamics change with more players, different types of resources, or the introduction of trust and reputation mechanisms. Ultimately, understanding the limits of fairness in strategic interactions is as important as striving for it.
🗞 Dueling over Multiple Pieces of Dessert
🧠 ArXiv: https://arxiv.org/abs/2602.11486
