Quantum Boltzmann Machines represent a promising new approach to machine learning, potentially offering significant speed advantages over traditional methods, but their training currently demands substantial computational resources. Daniëlle Schuman, Mark V. Seebode, and Tobias Rohe, alongside colleagues at LMU Munich and industry partners, address this challenge by developing a refined technique for training these quantum machines using parallel annealing. The team’s method dramatically reduces the required processing time, achieving a nearly 70% speed-up compared to standard quantum Boltzmann Machine training, and importantly, allows them to test the technology on real-world medical images from the MedMNIST dataset. This work demonstrates that quantum Boltzmann Machines, trained with this improved parallel annealing, can already achieve results comparable to those of conventional convolutional neural networks, but with far fewer training cycles, bringing practical quantum machine learning for image classification closer to reality.
Quantum Annealing for Deep Learning Training
This research explores the potential of quantum and simulated annealing techniques for training deep learning models, particularly Boltzmann Machines and Convolutional Neural Networks. The investigation centres on whether these annealing methods can effectively train models for tasks like image classification, and how their performance compares to established methods such as stochastic gradient descent. A key focus is understanding the benefits and limitations of applying annealing to deep learning challenges. Boltzmann Machines, probabilistic generative models comprising visible and hidden units, are central to this work, with specific attention given to Restricted Boltzmann Machines (RBMs) , where visible units connect only to hidden units , and Deep Boltzmann Machines (DBMs), which stack RBMs to create hierarchical representations. These models, while powerful, often suffer from computationally expensive training procedures, motivating the exploration of alternative optimisation strategies.
The research aims to use annealing to determine the optimal weights within these models for classification tasks. Convolutional Neural Networks (CNNs), a class of deep neural networks particularly effective for image processing, serve as a benchmark for comparison, with ResNet architectures, including ResNet18 and ResNet50, being specifically examined. These ResNet models employ residual connections to mitigate the vanishing gradient problem, enabling the training of very deep networks. Variations of Deep Boltzmann Machines, such as Centered Convolutional Deep Boltzmann Machines and Contractive Slab and Spike Convolutional Deep Boltzmann Machines, are also explored, demonstrating a broad investigation of architectural modifications. These variations introduce convolutional layers and sparsity-inducing regularisation techniques to improve feature extraction and generalisation performance. Annealing methods form the core of this research, offering a different approach to optimisation than gradient descent. Quantum Annealing utilizes D-Wave systems to identify the lowest energy state of a model, which corresponds to the optimal weights, by exploiting quantum mechanical phenomena like quantum tunnelling.
Simulated Annealing, a classical optimization algorithm inspired by the annealing process in metallurgy, provides a valuable point of comparison to quantum annealing, and is also available through D-Wave’s simulated annealing sampler. This algorithm iteratively explores the solution space, accepting worse solutions with a probability that decreases with temperature, allowing it to escape local optima. Traditional optimization algorithms, including Stochastic Gradient Descent (SGD) and Adam, are used as baseline methods for performance comparison. The research highlights the difficulties gradient descent faces when learning long-term dependencies, a common problem in recurrent neural networks and deep models, suggesting that annealing methods may offer a viable alternative. Contrastive Divergence (CD), an efficient learning algorithm for training Boltzmann Machines, is employed, approximating the gradient of the log-likelihood function. Random search is used to identify optimal hyperparameters for the models, a robust but computationally expensive method for hyperparameter optimisation. The choice of hyperparameters significantly impacts model performance, necessitating careful tuning.
D-Wave systems are the primary quantum annealing hardware used in this research, with the Advantage processor being specifically referenced. The Advantage processor boasts a larger qubit connectivity and improved coherence times compared to previous generations, enabling the tackling of more complex problems. The minorminer API facilitates interaction with the hardware, allowing researchers to map their optimisation problems onto the D-Wave’s qubit graph. Several software tools are integral to the implementation, including D-Wave’s Ocean SDK, a comprehensive software development kit for quantum computing, and the deep learning framework PyTorch, providing a flexible and efficient platform for building and training neural networks. LibSVM, a library for Support Vector Machines, is used for comparative evaluation, providing a well-established machine learning algorithm for benchmarking. Pymetis, a library for graph partitioning, is also employed, aiding in the mapping of complex problems onto the D-Wave’s hardware architecture. Effective problem mapping is crucial for achieving good performance on quantum annealers.
Weights & Biases (WandB) is used for tracking experiments and analysing results, providing a centralised platform for logging hyperparameters, metrics, and visualisations. The problem of training the models is formulated as a Quadratic Unconstrained Binary Optimization (QUBO) or Ising problem, which is suitable for solving with quantum annealers. QUBO and Ising formulations represent the optimisation problem as minimising an energy function with binary variables, aligning with the natural operation of quantum annealers. The research involves benchmarking the performance of quantum hardware for training Boltzmann Machines, assessing its potential speedup and accuracy compared to classical algorithms. ImageNet, a large-scale image database with over 14 million images, is used for training and evaluating Convolutional Neural Networks, providing a challenging benchmark for image classification. A dataset of breast ultrasound images is also used for a specific medical imaging task, demonstrating the applicability of these techniques to real-world problems. This medical imaging dataset allows for the evaluation of the models’ performance in a clinically relevant context.
Standard classification metrics such as accuracy, precision, recall, and F1-score are implied as evaluation measures, providing a comprehensive assessment of the models’ performance. Validation sets are used for hyperparameter tuning and model selection, preventing overfitting and ensuring generalisation to unseen data. This research builds upon established concepts and architectures in deep learning, including Convolutional Neural Networks and ResNets, while acknowledging the challenges of training deep networks, such as the vanishing gradient problem and computational cost. Existing work on Boltzmann Machines and their applications forms a foundation for this study, leveraging previous advancements in probabilistic modelling. Previous research on the use of quantum annealing and simulated annealing for optimization problems is also referenced, along with related work on specific variations of deep Boltzmann machines, such as Contractive Slab and Spike Convolutional Deep Boltzmann Machines, providing context for the current investigation.
This research investigates a novel approach to training deep learning models using annealing methods, offering a potential alternative to traditional gradient-based optimization. The study aims to compare the performance of annealing methods with established algorithms like Stochastic Gradient Descent and Adam, quantifying any potential benefits in terms of speed, accuracy, or generalisation. The research provides valuable insights into the capabilities and limitations of D-Wave systems for training deep learning models, and suggests that annealing methods may offer a viable solution for challenging optimization problems, particularly those with complex energy landscapes. Further research is needed to explore the scalability and robustness of these techniques, and to identify the types of problems for which annealing methods are most effective.
“`
👉 More information
🗞 Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification
🧠 DOI: https://doi.org/10.48550/arXiv.2507.14116
