Researchers are increasingly focused on ensuring the integrity of machine learning models, particularly as their use expands into sensitive applications. Nikolas Melissaris from CNRS, alongside Jiayi Xu and Antigoni Polychroniadou from JPMorgan AI Research, with Akira Takahashi and Chenkai Weng, present ZKBoost, a novel zero-knowledge proof of training (zkPoT) protocol specifically designed for XGBoost models. This work represents a significant advance by providing the first cryptographic guarantee of correct XGBoost training on a committed dataset, without disclosing either the data itself or the model’s parameters. Through a fixed-point XGBoost implementation and a generic zkPoT template, alongside a vector oblivious linear evaluation (VOLE)-based instantiation, the authors demonstrate practical zkPoT on real-world datasets, maintaining comparable accuracy to standard XGBoost within one percent.
Verifying XGBoost model training via zero-knowledge proofs for data privacy enables secure and trustworthy machine learning applications
Researchers have developed ZKBoost, the first zero-knowledge proof of training protocol for XGBoost, addressing a critical need for cryptographic guarantees of model integrity in sensitive applications. As machine learning models become increasingly prevalent, ensuring their trustworthiness and accountability is paramount, and this work provides a solution for verifying correct training without revealing private data or model parameters.
The study introduces a novel approach to prove that a model was genuinely obtained by training on a committed dataset with specified hyperparameters, preventing malicious shortcuts such as handcrafted models or unauthorized data manipulation. ZKBoost enables a provider to convince a verifier of this truth without disclosing any information beyond the validity of the training process.
A key innovation within ZKBoost is a fixed-point implementation of XGBoost, designed to be compatible with the arithmetic circuits required by zero-knowledge proof systems. Standard XGBoost relies on floating-point arithmetic, which presents challenges for cryptographic verification, but this new implementation utilizes deterministic, bounded-precision fixed-point calculations.
Empirical results demonstrate that this fixed-point version maintains accuracy within 1% of the standard floating-point XGBoost, a crucial achievement for practical application. This compatibility with arithmetic circuits allows for the efficient instantiation of zero-knowledge proofs, paving the way for trustworthy machine learning services and decentralized models.
Furthermore, the researchers developed CertXGB, a certification algorithm that abstracts arithmetic circuits for efficiently validating the model’s origin. This algorithm enables parallel validation of each tree in the XGBoost ensemble, significantly improving efficiency compared to sequentially re-executing the training procedure.
The generic design of CertXGB allows it to be integrated with any general-purpose zero-knowledge proof backend, offering flexibility and adaptability. This breakthrough has implications for a range of applications, including trustworthy machine learning-as-a-service, decentralized machine learning, and compliance with data restrictions.
The work demonstrates the feasibility of applying zero-knowledge proofs to gradient boosted decision trees, specifically XGBoost, a widely used method for structured data. By achieving nearly identical accuracy to standard XGBoost while enabling cryptographic verification, ZKBoost represents a significant step towards ensuring the integrity and provenance of machine learning models in real-world deployments.
Fixed-point arithmetic and circuit construction for zero-knowledge proof of training are crucial for efficient and verifiable machine learning
A fixed-point XGBoost implementation underpins the development of ZKBoost, a zero-knowledge proof of training protocol. This implementation was specifically designed to be compatible with arithmetic circuits, a crucial step towards enabling efficient zkPoT for gradient boosted decision trees. Researchers achieved this by representing XGBoost’s calculations using fixed-point arithmetic, allowing for translation into a circuit-friendly format suitable for cryptographic verification.
The study then constructed a generic zkPoT template tailored for XGBoost training. This template facilitates instantiation with any general-purpose zero-knowledge proof backend, offering flexibility in selecting the cryptographic system. Central to this template is the ability to prove the correctness of each step in the XGBoost training process without revealing the underlying data or model parameters.
This was accomplished by breaking down the training procedure into a series of verifiable computations. To address challenges associated with proving nonlinear fixed-point operations, the work introduced a vector oblivious linear evaluation (VOLE)-based instantiation. VOLE enables secure evaluation of linear functions on masked data, which is essential for handling the nonlinearities inherent in fixed-point arithmetic within the zero-knowledge proof.
This technique ensures that the verifier can confirm the correctness of these operations without learning any information about the inputs. Experiments demonstrated that this fixed-point implementation maintains 1% accuracy compared to standard XGBoost, while simultaneously enabling practical zkPoT on real-world datasets.
This level of accuracy preservation is critical for ensuring the utility of the verified model, demonstrating that cryptographic verification does not significantly compromise performance. The methodology highlights a pathway to deploy robust and trustworthy machine learning models in sensitive applications.
Fixed-point arithmetic enables zero-knowledge proofs for XGBoost model training without revealing sensitive data
ZKBoost achieves a remarkably high level of accuracy, maintaining 1% precision when compared to standard XGBoost implementations. This fixed-point XGBoost implementation is compatible with arithmetic circuits, a crucial step towards enabling efficient zero-knowledge Proof of Training (zkPoT). The research introduces the first zkPoT protocol for XGBoost, allowing model owners to demonstrate correct training on a committed dataset without disclosing sensitive data or model parameters.
A key component of this work is a generic template, CertXGB, for zkPoT of XGBoost that can be integrated with any general-purpose Zero-Knowledge Proof (ZKP) backend. This certification algorithm efficiently verifies that a model was correctly generated by executing the fixed-point XGBoost algorithm on a given dataset.
Validation of each tree within the XGBoost ensemble can occur independently and in parallel, significantly improving efficiency. Furthermore, the study details a vector oblivious linear evaluation (VOLE)-based instantiation, demonstrating practical zkPoT performance on real-world datasets. This instantiation incorporates improved ZKP subcomponents for secure proof of non-linear fixed-point operations, including comparison, division, and truncation. The work also addresses potential security vulnerabilities related to arithmetic overflows, enhancing the robustness of the zkPoT process.
Zero-knowledge proofs preserve XGBoost model accuracy with fixed-point arithmetic while ensuring privacy
Gradient boosted decision trees represent a powerful technique for analysing tabular data. A new protocol, ZKBoost, facilitates zero-knowledge Proof of Training for XGBoost models, addressing the increasing need for cryptographic guarantees of model integrity in sensitive applications. This system enables model owners to demonstrate correct training on a committed dataset without disclosing either the data itself or the model’s parameters.
ZKBoost achieves this through a fixed-point XGBoost implementation compatible with arithmetic circuits, alongside a generic template for zero-knowledge Proof of Training applicable to various zero-knowledge proof backends. A key innovation lies in its use of vector oblivious linear evaluation to resolve challenges associated with proving nonlinear fixed-point operations.
Importantly, this fixed-point implementation maintains nearly identical accuracy to standard XGBoost, with a demonstrated accuracy preservation of 1%, while simultaneously enabling practical cryptographic verification on real-world datasets. The authors acknowledge that the current implementation relies on specific cryptographic assumptions and optimizations.
Future work may focus on exploring alternative cryptographic primitives and further enhancing the efficiency of the protocol to broaden its applicability. Nevertheless, ZKBoost represents a significant step towards trustworthy machine learning, offering a means to verify model integrity and paving the way for applications requiring assurance of training provenance and data privacy.
👉 More information
🗞 ZKBoost: Zero-Knowledge Verifiable Training for XGBoost
🧠 ArXiv: https://arxiv.org/abs/2602.04113
