Quantum machine learning (QML) is the working name for the family of algorithms that combines quantum information processing with classical statistical learning. It uses quantum computers to encode data into quantum states, run circuits that compute kernel values or train parameters, and read out predictions or cluster assignments. The field has matured from theoretical curiosity in 2013 to a hybrid quantum-classical engineering practice in 2026, with established libraries, a growing benchmark literature, and a small but serious commercial deployment footprint.
Last updated: May 2026. This page is maintained as a living reference; figures, milestones, and roadmap dates are refreshed quarterly against the most recent vendor announcements and conference proceedings.
- QML is hybrid by default. A classical outer loop wraps a quantum subroutine; pure-quantum machine learning is research, not production.
- Four families dominate. Quantum kernel methods (production-ready), variational circuits and QNNs (deployable but training-fragile), generative models (research), reinforcement learning (pre-production).
- The advantage is geometric. Quantum kernels embed data in a 2ᴧ-dimensional Hilbert space whose pairwise overlaps encode patterns no polynomial classical feature map can match.
- The swap test is the key primitive. One ancilla, one Hadamard, one controlled-SWAP, one Hadamard, one measurement: P(0) = (1 + |〈a|b〉|²) / 2 turns a quantum overlap into a classical kernel entry.
- Barren plateaus are the dominant training risk. Deep generic variational circuits produce flat loss landscapes; problem-aware ansatzes and local cost functions are the standard 2026 mitigations.
- The cost gap is five orders of magnitude. A $0 laptop SVM versus hundreds of dollars and overnight on a 2026 QPU. The discriminating question is whether the data has structure a quantum feature map captures that classical maps cannot.
- Provable advantage exists, narrowly. The Liu-Arunachalam-Temme discrete-log kernel gives a formal separation; many other “quantum advantage” claims collapse under proper cross-validation.
- Production fit is narrow but real. Quantum-native data (chemistry samples, encoded materials), structured kernels (finance), and the inner loop of hybrid pipelines are where QML actually ships in 2026.
What is quantum machine learning, exactly?
Quantum machine learning combines two fields. Classical machine learning is the toolbox of statistical-inference algorithms (regression, classification, clustering, generative models, reinforcement learning) that find patterns in data and extrapolate them to new inputs. Quantum information processing is the toolbox of quantum-circuit primitives (gates, measurements, entanglement, swap tests) that exploit quantum superposition and interference to compute things classical computers cannot easily compute.
QML is the family of algorithms that uses quantum-circuit primitives in the inner loop of a classical machine-learning algorithm, or that runs the entire learning pipeline on a quantum computer with classical post-processing. The 2013 Lloyd, Mohseni, and Rebentrost paper on supervised and unsupervised quantum machine learning (arXiv:1307.0411) launched the modern field, and the practical commercial implementations of 2026 are typically hybrid: classical orchestration (data loading, optimisation, convergence-check) plus quantum kernel computation or variational-circuit evaluation in the inner loop.
A brief history of quantum machine learning
The thirteen-year arc from theory to commercial deployment is short enough to remember, and worth knowing because the major architectural choices in 2026 QML libraries trace directly to specific papers. Reading the field as a chain of named results (rather than a vague “quantum is coming” narrative) is also how working practitioners decide which 2026 vendor and library to commit to.
The 1990s laid the algorithmic groundwork. Grover’s 1996 search algorithm gave the first concrete quadratic speedup for unstructured search, and the same primitive later became the engine of amplitude estimation, the basis for quantum Monte Carlo speedups in finance. Through the 2000s, Harrow, Hassidim, and Lloyd’s 2009 HHL algorithm for solving linear systems exponentially faster than classical approaches set the agenda for what an “exponential” quantum machine learning advantage might look like, and seeded a decade of follow-up work.
The modern QML era began in 2013 with Lloyd, Mohseni, and Rebentrost’s paper on supervised and unsupervised quantum machine learning, which packaged HHL as a primitive for distance estimation, k-means, and nearest-centroid classification. The same year, Rebentrost, Mohseni, and Lloyd’s quantum support vector machine formalised the kernel-method route. The 2017 Biamonte et al. Nature review consolidated the field and is still the most-cited single QML reference.
The 2018 to 2021 period reframed the practical landscape. Mitarai et al.’s parameter-shift rule made variational circuits trainable with the same back-propagation discipline as classical neural networks. McClean et al.’s barren-plateau result showed that random deep parameterised circuits produce exponentially vanishing gradients, killing many naive variational approaches and forcing the field toward structured ansatzes. Havlíček et al.’s 2019 Nature paper on supervised learning with quantum-enhanced feature spaces demonstrated quantum kernel methods on real IBM superconducting hardware, the first credible end-to-end QML run on a NISQ device. Cerezo et al.’s 2021 review of variational quantum algorithms in Nature Reviews Physics is the canonical engineering reference for everything since.
2022 to 2026 was the deployment era. Aaronson and Tang’s “dequantisation” line of work sharpened the conditions under which a claimed quantum speedup actually survives the existence of better classical algorithms. Huang et al.’s power-of-data Nature Physics paper in 2022 articulated the modern position: quantum advantage in machine learning is real but conditional on data, encoder, and measurement protocol, and naive comparisons routinely overstate it. Production QML in 2026 is built on this more disciplined view: hybrid by default, kernel-first for tabular work, variational for chemistry and materials, and never deployed without a classical baseline first.
QML is the family of algorithms that uses quantum-circuit primitives in the inner loop of an otherwise-classical machine learning pipeline. The thirteen-year arc from Lloyd et al.’s 2013 theory paper to production deployments at JPMorgan, IBM, and Quantinuum is short enough to read as a chain of named results, and the modern view is that quantum advantage in ML is real but conditional on data, encoder, and measurement protocol.
QML 101: the 24 terms you need before reading the literature
Quantum machine learning has its own vocabulary, half borrowed from quantum information theory and half from classical machine learning. Skim this glossary once and the rest of this guide (and most of the literature) becomes much easier to read. For a deeper general-quantum vocabulary, see our top 20 quantum computing terms companion guide.
Foundation terms
The basic unit of quantum information: a two-level quantum system that can occupy a superposition of |0〉 and |1〉 rather than only one or the other. Physical implementations include superconducting transmons, trapped ions, neutral atoms, and photons.
The ability of a qubit to exist in a linear combination of |0〉 and |1〉 simultaneously, parameterised by a complex amplitude and phase. The amplitudes square to probabilities of measuring 0 or 1, and the phase carries the interference structure that classical bits cannot represent.
A correlation between two or more qubits whose joint state cannot be written as a product of individual qubit states. The resource that makes quantum computing more than probabilistic computing.
A reversible operation that transforms qubit states. Single-qubit (X, Y, Z, H, S, T) and two-qubit (CNOT, CZ) gates form a universal set.
A sequence of gates applied to a register of qubits, ending in a measurement. The unit of execution on any quantum machine.
The single most-used QML gate. Maps |0〉 to an equal superposition of |0〉 and |1〉. The starting move of almost every QML circuit.
The controlled-NOT: flips the target qubit if and only if the control qubit is |1〉. The default two-qubit entangling gate.
A projective readout that collapses a qubit’s superposition into a classical 0 or 1 with a probability set by the state. Every QML model emits its prediction through a measurement, so the choice of observable and the number of shots together set the precision of the final answer.
The mathematical space a quantum state lives in. For N qubits the Hilbert space has 2ⁿ complex dimensions.
The mean of a measurement outcome over many repeated runs. Most QML cost functions are expectation values of an observable.
One run of a quantum circuit followed by one measurement. QML estimates expectation values by averaging over thousands or millions of shots.
Encoder and kernel terms
The quantum subroutine that loads a classical data point x into a quantum state |φ(x)〉. Also called the feature map.
A specific encoder choice (ZZ-feature map, IQP-style, hardware-efficient) that determines the geometry of the Hilbert-space embedding. The wrong feature map collapses the resulting quantum kernel to a classical-equivalent kernel and removes any quantum advantage before training even starts.
The squared overlap |〈φ(x)|φ(y)〉|² between two encoded states. Plays the same role as a classical kernel function.
The quantum primitive that estimates a quantum kernel value with one ancilla qubit and a controlled-SWAP. Foundation of kernel-based QML.
Variational and operations terms
The chosen functional form of a parameterised quantum circuit. Hardware-efficient ansatz, ZZ ansatz, and UCC ansatz are common families.
A parameterised quantum circuit whose tunable angles are trained by a classical optimiser to minimise a cost function. The training loop is structurally identical to a classical neural network: forward pass on the QPU, gradient via parameter-shift, weight update on the CPU.
Synonym for VQC. The terms are interchangeable in the 2026 literature.
Mitarai et al. 2018: an exact analytical expression for the gradient of an expectation value with respect to a rotation angle, computed by evaluating the circuit at two shifted parameter values. The cost is two forward passes per parameter, which scales linearly and is what makes variational QML trainable on real hardware.
The phenomenon where gradients in deep generic VQCs vanish exponentially with qubit count, killing trainability. The dominant 2026 VQC engineering problem.
The flagship variational algorithm for chemistry: train a PQC so its expectation value approximates the ground-state energy of a molecular Hamiltonian. VQE is the first QML family that has shipped as a production workload at IBM, Quantinuum, and Pasqal, and it anchors the chemistry case for 2026 enterprise buyers.
Quantum Approximate Optimisation Algorithm. A specific PQC structure designed for combinatorial-optimisation problems (portfolio selection, MaxCut).
Noisy Intermediate-Scale Quantum. The 2018-coined Preskill term for the current era of 50-1500-qubit machines without full error correction.
The dominant 2026 QML pattern: a classical computer drives the outer loop and a quantum machine evaluates a kernel value or circuit expectation in the inner loop. The closest classical analogue is GPU offload: the CPU runs most of the program, and the specialised accelerator handles the slice where it pays off.
Classical vs quantum machine learning, side by side
Quantum machine learning shares more vocabulary with classical machine learning than it shares mechanics. Both pipelines start with data, end with a prediction or a sample, and use a loss function and gradient descent in the outer training loop. Both rely on the same statistical-learning theory for generalisation bounds, the same train/validation/test discipline, the same metrics (accuracy, F1, AUC, log-loss), and the same upstream data engineering. If you handed a working QML team a classical ML problem they would solve it with familiar tools, and vice versa for the parts of QML that are not the quantum subroutine itself. The differences only appear once you look inside the inner loop, at the model and the way it is evaluated.
What changes inside the inner loop
Inside that inner loop the two diverge structurally. Classical ML stores parameters as real-valued floats in CPU or GPU memory and updates them with deterministic arithmetic; QML stores them as rotation angles inside a quantum circuit and reads gradients out via measurement statistics that come with shot noise. Classical ML feature maps live in real vector spaces; QML feature maps live in a complex Hilbert space whose dimension grows exponentially with the qubit count.
Classical inference is a deterministic forward pass; QML inference is a stochastic measurement that requires thousands of repetitions to converge on a single classical answer. Classical training cost grows roughly linearly in dataset size; QML training cost grows in shots per data pair, which means kernel methods scale quadratically with dataset size and become impractical above a few thousand examples.
The right mental model
The right mental model is that QML is not a faster classical computer, it is a different kind of computer that wears similar clothes. Where classical machine learning treats the model as a function from R^n to R^k that you fit by minimising a smooth loss, quantum machine learning treats the model as a unitary transformation on a 2^N-dimensional state space, sampled into a classical answer by measurement. The table below covers twelve dimensions where that distinction shows up in practical engineering decisions. If a row matters to your decision, the right place to dig further is the section of this article (or a linked source) that covers it in depth.
Twelve dimensions where it shows up
| Dimension | Classical ML | Quantum machine learning |
|---|---|---|
| Hardware | CPUs, GPUs, TPUs (mature, abundant, $) | Superconducting, trapped ion, neutral atom, photonic QPUs (scarce, expensive, queued) |
| Data representation | Tensors in classical memory | Quantum states in qubit registers; requires explicit encoder |
| Training algorithm | Backpropagation, SGD, Adam | Parameter-shift rule, classical SGD/Adam in the outer loop |
| Loss landscape | Local minima, saddle points (well-studied) | Barren plateaus in deep generic ansatzes; well-shaped in problem-aware ansatzes |
| Error rate | Effectively zero (deterministic floats) | 0.1 to 1 percent per two-qubit gate; mitigated through error mitigation or correction |
| Interpretability | SHAP, LIME, integrated gradients | Q-LIME, quantum Shapley values (active research) |
| Scalability with data | Excellent; gradient-boosting + deep learning scale to billions of examples | Bounded by per-shot cost; small to medium datasets only |
| Cost per inference | Microcents on a GPU, sub-millisecond latency | Cents to dollars on a QPU, seconds-to-minutes latency including queue |
| Talent pool | Millions of practitioners worldwide | Thousands of QML practitioners; growing 40+ percent year-over-year |
| Frameworks | PyTorch, TensorFlow, JAX, scikit-learn | PennyLane, Qiskit ML, TFQ, Classiq, Pulser, Perceval |
| Production fit | Any data-driven task at any scale | Quantum-native data, structured kernel problems, chemistry, optimisation |
| 2026 status | Mature, commodity | Emerging, hybrid-by-default, narrow advantage in select domains |
A concrete worked example
A concrete worked example makes the table tangible. Suppose the task is binary classification on a 200-point dataset with 16 features per point, the kind of problem a financial-fraud team might run in an afternoon. The classical pipeline picks scikit-learn’s RBF-kernel SVM, fits in roughly fifty milliseconds on a laptop, makes predictions in microseconds, and costs nothing beyond the laptop’s electricity.
The QML pipeline picks a 16-qubit ZZ-feature-map encoder, runs roughly 20,000 swap-test shots per kernel entry across a 200-by-200 kernel matrix, takes about eight hours of wall time on a 2026 IBM Heron-class machine including queue, fits the same classical SVM on the resulting quantum kernel matrix, and costs in the low hundreds of dollars at current cloud QPU pricing. Both pipelines produce a trained classifier; only one of them takes overnight and a budget request.
Is the overhead worth it?
The question that follows is whether the quantum version is worth that overhead, and the honest answer in 2026 is “rarely, but in a small set of cases yes”. On random tabular data with no underlying quantum structure, the quantum kernel does not beat scikit-learn, and any apparent advantage in a small benchmark almost always collapses under proper cross-validation.
On data with structure that classical kernels cannot reproduce (encoded chemistry samples, time-series with deeply non-stationary correlations, certain financial-network features), the quantum kernel can outperform the best classical baseline by enough to justify the cost. The discriminating question is not “is the classifier accurate?”, it is “is the structure in the data such that a quantum feature map captures something a classical feature map provably cannot?”. Most production QML deployments today get this question wrong by picking a problem where the answer is no.
The 2026 hybrid default
The right reading of this table is that QML is not a replacement for classical machine learning, it is a specialised accelerator for a small set of problem classes where the inner-loop computation maps cleanly onto quantum primitives and where the data either is already quantum or can be encoded efficiently into a Hilbert space. The 2026 production pattern is hybrid: classical ML for the majority of the workload, quantum machine learning for the inner-loop steps where it can demonstrably beat the classical baseline. Teams that ship in this domain plan around hybridisation from day one, not as a fallback when the quantum side fails to deliver.
The four families of quantum machine learning algorithms
Quantum machine learning is not a single algorithm, it is a research programme that has fanned out into four distinct families since 2014. The reason that distinction matters is purely practical. Each family uses the quantum machine in a different way, demands a different qubit count, breaks under different noise profiles, and ships into a different production stack. Treating QML as one thing is the single most common conceptual error in 2026, and it is why the wrong family gets picked for an enterprise pilot more often than any other failure mode.
The four families are quantum kernel methods, variational quantum circuits and quantum neural networks, quantum generative models, and quantum reinforcement learning. They differ along three axes. The first is where the training happens: kernel methods train classically on top of quantum-computed similarities, while the other three train the quantum circuit itself. The second is what the circuit produces: a single number per data pair for kernels, a label for variational classifiers, a sample for generative models, an action for reinforcement learning. The third is how mature each family is as a production tool: kernel methods are deployable today on twenty to thirty qubits, variational circuits are deployable but unstable, generative models are research-grade with early production interest, and reinforcement learning is largely pre-production.
Knowing which family a given algorithm belongs to is the first decision in any QML project, because that choice determines which library, which qubit modality, and which cost structure the rest of the workflow inherits. Picking kernels means committing to a swap-test workload and a classical SVM wrapper; picking variational circuits means committing to parameter-shift gradient runs and the barren-plateau problem; picking generative or reinforcement means accepting a higher research burden in exchange for a chance at advantage on classically intractable distributions or environments. The four sections that follow unpack each family on its own terms, in order of how often they appear in real 2026 deployments.
Quantum kernel methods
The simplest and most-deployed family. The quantum machine computes kernel values (pairwise similarities between data points encoded as quantum states), and a classical SVM or k-nearest-neighbours wrapper does the actual classification or regression. Quantum kernel support vector machines are the canonical example, and the related quantum k-means clustering walkthrough shows the same swap-test primitive applied in an unsupervised setting. Kernel methods are easiest to deploy because they need only a small quantum circuit per kernel entry and the classical wrapper does the heavy lifting.
To see why kernels matter, start with the support vector machine, the classical workhorse the quantum version replaces in the inner loop. An SVM finds the hyperplane that separates two classes with the largest possible margin; only the points closest to the boundary (the support vectors) determine where it sits.
Kernel trick and quantum feature maps
The “kernel trick” lets a linear classifier handle non-linearly-separable data by lifting points into a higher-dimensional feature space where a hyperplane separates them, without ever computing the lift explicitly. A quantum kernel uses a quantum circuit as the feature map: the inner product is the squared overlap between two encoded states, and if the encoder is hard to simulate classically, the kernel is genuinely doing quantum work.
Before the swap test there is one piece of vocabulary to nail down: what a Hilbert space actually is. The figure caption above leans on the phrase “quantum Hilbert space” without explaining it, and so does the rest of the kernel-methods discussion, so the term is worth pinning down once. A Hilbert space sounds exotic but it is the simplest possible generalisation of the ordinary geometry every reader already knows.
In two dimensions, a point is a pair of real numbers (x, y), and two points have a dot product x₁x₂ + y₁y₂ that tells you how aligned they are. A Hilbert space is the same idea pushed two steps further: the coordinates can be complex numbers instead of real ones, and the number of dimensions can be larger than three, including infinite, while still keeping a well-behaved notion of length, angle, and inner product. Everything you learned in linear algebra still applies; the only change is that “dot product” is now called an “inner product”, and complex conjugation appears so that the length of a vector stays real and positive.
The vertical-bar-and-angle notation around a label, called bra-ket or Dirac notation, is just shorthand for vectors in this space. A ket |ψ⟩ is a column vector, the corresponding bra ⟨ψ| is its conjugate-transpose row vector, and the inner product ⟨a|b⟩ is what you get when you multiply a bra by a ket on the right. The notation pays off when computing overlaps and probabilities, because it makes the geometry visible at a glance.
The two basis kets |0⟩ and |1⟩ are by convention the column vectors (1, 0) and (0, 1), and they play the role of the classical bit values 0 and 1, but with the crucial difference that a qubit can sit at any unit-length complex combination of them. The combination is what gives a qubit its computational reach over a classical bit, and it is the reason every QML algorithm has to think about state preparation before it can think about training.
Physically these basis states are whatever two-level system the hardware has built: the ground and excited states of a trapped ion, spin-up and spin-down of an electron, horizontal and vertical polarisation of a photon, the lowest two energy levels of a superconducting transmon, or the |g⟩ and |r⟩ atomic levels of a Rydberg array. The Hilbert-space treatment is hardware-agnostic on purpose, which is part of why the same QML algorithm runs on IBM superconducting, IonQ trapped-ion, and QuEra neutral-atom backends without rewriting.
A single qubit lives in the two-dimensional Hilbert space spanned by the basis states |0⟩ and |1⟩, and any pure state of that qubit is a unit-length complex combination α|0⟩ + β|1⟩ with |α|² + |β|² = 1. The unit-length constraint is what makes the geometry probabilistic: the squared amplitudes are the measurement probabilities.
Two qubits live in a four-dimensional Hilbert space spanned by |00⟩, |01⟩, |10⟩, |11⟩; ten qubits live in 1,024 dimensions; thirty qubits live in just over a billion. Each extra qubit doubles the dimensionality, and that exponential is exactly why a quantum kernel can in principle encode patterns that no polynomial-size classical feature map can match. The space the data is being embedded into is not just big, it is structured in a way that classical computers cannot easily simulate.
The inner product between two quantum states is written ⟨a|b⟩, and it is a complex number whose squared modulus |⟨a|b⟩|² is the geometric quantity the rest of this section depends on. If |a⟩ and |b⟩ are identical, the inner product has magnitude one. If they are orthogonal (which means perfectly distinguishable by some measurement), the inner product is zero.
If they sit anywhere in between, the squared inner product is a continuous similarity score between zero and one, exactly the shape an SVM wants from a kernel. The geometry of the Hilbert space therefore gives quantum kernels for free: any quantum encoder produces a feature map whose pairwise similarities are already valid kernel values, with no extra mathematical machinery needed. What it does not give for free is a way to read those similarities out classically, and that is the gap the swap test fills.
The mechanical primitive that turns two encoded states into a kernel value is the swap test, and it deserves to be understood on its own terms before we go further. Two quantum states |a⟩ and |b⟩ sit in a high-dimensional Hilbert space, and what we want from them is a single classical number: |⟨a|b⟩|², the squared overlap, which is one if the states are identical, zero if they are orthogonal, and somewhere in between for everything else.
That number is the geometry the SVM cares about. The problem is that quantum mechanics will not let us read it out directly. There is no measurement that returns “the overlap of these two states”, because measurements collapse states rather than comparing them, and the no-cloning theorem means we cannot even make a second copy of one state to align with the other and check.
The swap test is the trick that gets around this. It uses one extra qubit, the ancilla, and one extra gate, the controlled-SWAP (also called the Fredkin gate), to convert an inner-product question into a measurement-probability question. The circuit has four steps. First, prepare the ancilla in |0⟩ and the two data registers in |a⟩ and |b⟩.
Second, apply a Hadamard gate to the ancilla, which puts it in the equal superposition (|0⟩ + |1⟩)/√2. Third, apply a controlled-SWAP: when the ancilla is |1⟩ it swaps the two data registers, when it is |0⟩ it leaves them alone. Fourth, apply a second Hadamard to the ancilla and measure it in the computational basis. The output is a single classical bit, either zero or one.
The magic is in why the answer comes out the way it does. After the first Hadamard, the state of the whole system is a superposition of (|0⟩|a⟩|b⟩ + |1⟩|a⟩|b⟩)/√2. The controlled-SWAP entangles the ancilla with whether the registers have been swapped, giving (|0⟩|a⟩|b⟩ + |1⟩|b⟩|a⟩)/√2.
The second Hadamard mixes the two ancilla branches, and what comes out the other side is an interference pattern in which the |0⟩ branch picks up the symmetric combination of |a⟩|b⟩ and |b⟩|a⟩ while the |1⟩ branch picks up the antisymmetric combination. The probability of measuring zero on the ancilla works out exactly to (1 + |⟨a|b⟩|²) / 2, and the probability of measuring one is (1 − |⟨a|b⟩|²) / 2. Rearrange and the overlap is just 2 P(0) − 1.
That last equation is the whole point. We never see the overlap directly, but a frequency we can measure (how often the ancilla reads zero across many shots) maps deterministically to the overlap we want. Running the circuit ten thousand times gives an estimate of P(0) accurate to roughly one percent, which is enough to drive an SVM on most real-world classification tasks.
Running it a hundred thousand times tightens that to about three significant figures. The ancilla overhead is one qubit; the gate overhead is a Hadamard, a controlled-SWAP, and a Hadamard; and the measurement overhead is one classical bit per shot. For a kernel of size N-by-N, that is N(N+1)/2 distinct circuits to estimate independently, each replayed for shot-count many runs, with the resulting kernel matrix shipped straight into a classical SVM solver.
There are two engineering caveats worth knowing. First, the swap test is not the only way to estimate an overlap. The Hadamard test, the inversion test (apply the unitary that prepares |a⟩, then the inverse unitary that prepares |b⟩, and measure all-zeros probability), and the compute-uncompute kernel of Havlíček et al. all give the same answer with different gate budgets, and on hardware-efficient ansätze the inversion test typically saves a factor of two in circuit depth by avoiding the controlled-SWAP entirely.
Second, the swap-test estimate is fundamentally a sampled quantity, so it inherits all the usual shot-noise statistics: standard error in the kernel entry falls as 1/√N_shots, which is why kernel-method deployments size their shot budgets at the kernel-matrix level rather than the per-circuit level. The trade-off is between time-on-machine and SVM accuracy, and it is the dominant cost item in a kernel-methods QML deployment today.
The swap test in detail
Four gates, one ancilla, one measured bit. Two unknown quantum states go in; the squared overlap |⟨a|b⟩|² comes out. The interference move at the second Hadamard is the part that does the real work, and the derivation below shows exactly how a single ancilla measurement carries the overlap.
The non-obvious feat is that the algorithm reads off |⟨a|b⟩|² without ever inspecting |a⟩ or |b⟩ directly. The no-cloning theorem forbids copying either state, and any measurement on the data registers would collapse the very superposition the encoder spent gates preparing, so the swap test instead entangles a fresh ancilla with a “data was swapped or not” branch and reads off only the ancilla.
The full circuit is three registers and four gate layers: a Hadamard on the ancilla, a controlled-SWAP between the two data registers, a second Hadamard on the ancilla, and a single ancilla measurement. The joint state evolves through four labelled snapshots, written below in compressed bra-ket form.
|ψ₀⟩ = |0⟩ |a⟩ |b⟩ (initialise)
|ψ₁⟩ = (1/√2) (|0⟩ + |1⟩) |a⟩ |b⟩ (H on ancilla)
|ψ₂⟩ = (1/√2) (|0⟩ |a⟩ |b⟩ + |1⟩ |b⟩ |a⟩) (controlled-SWAP)
|ψ₃⟩ = (1/2) [ |0⟩ (|a⟩|b⟩ + |b⟩|a⟩) + |1⟩ (|a⟩|b⟩ − |b⟩|a⟩) ] (H on ancilla)
The first Hadamard prepares the ancilla in an equal superposition without touching the data; the controlled-SWAP entangles the ancilla with the swapped-versus-not branch; the second Hadamard is the gate that earns the algorithm its result. Interfering the two ancilla branches forces the |0⟩ outcome to collect the symmetric combination |a⟩|b⟩ + |b⟩|a⟩ and the |1⟩ outcome to collect the antisymmetric one, so the |0⟩-versus-|1⟩ statistics directly encode how similar |a⟩ and |b⟩ are.
Reading off the |0⟩-outcome probability is then a matter of squaring the norm of the |0⟩ branch of |ψ₃⟩ and using the unit-norm condition ⟨a|a⟩ = ⟨b|b⟩ = 1. The four cross terms in the expanded inner product collapse to two copies of |⟨a|b⟩|² and two copies of 1, and the prefactor of 1/4 turns the sum into a closed form that depends only on the overlap.
P(0) = (1/4) · (⟨a|⟨b| + ⟨b|⟨a|) (|a⟩|b⟩ + |b⟩|a⟩)
= (1/4) · [ ⟨a|a⟩⟨b|b⟩ + ⟨a|b⟩⟨b|a⟩ + ⟨b|a⟩⟨a|b⟩ + ⟨b|b⟩⟨a|a⟩ ]
= (1/4) · [ 1 + |⟨a|b⟩|² + |⟨a|b⟩|² + 1 ]
= (1 + |⟨a|b⟩|²) / 2.
Two corollaries fall out of the closed form for free. Identical states (|⟨a|b⟩|² = 1) give P(0) = 1, so a single ancilla measurement deterministically confirms two states are the same; orthogonal states give P(0) = 1/2, which is maximum-entropy binomial noise and demands the largest shot budget to estimate. The middling overlaps that sit on the decision boundary of a downstream classifier are therefore the kernel entries that consume most of the shot budget in any real deployment, which is the practical lesson the textbook derivation pays for free.
Shot budgets follow directly from that bound. Estimating |〈a|b〉|² to additive precision ε with confidence 1 minus δ takes on the order of log(1/δ) divided by ε² shots by Hoeffding’s inequality, so two-decimal-place precision (around 0.01) needs roughly 10,000 shots per kernel entry. A 200-by-200 kernel matrix therefore runs to about 400 million shots, which on a 5,000-shots-per-second IBM Heron-class machine is roughly a day of wall time. Those numbers are the reason kernel methods stay tractable on NISQ hardware: small circuits, large but parallelisable shot budgets, and a classical SVM downstream that does not care if the kernel arrives in pieces.
Three practical pitfalls are worth knowing before deploying a swap-test pipeline. The controlled-SWAP itself decomposes into three CNOTs plus a small constant of single-qubit gates per data-register qubit, so the gate-count overhead scales linearly with register width and can dominate the depth budget on tight-coherence hardware. Ancilla decoherence between the two Hadamards leaks information straight out of the measurement, so the ancilla should be the lowest-error qubit on the chip whenever the routing lets you choose it. Readout fidelity also multiplies through the whole shot budget, so a 95 percent readout error rate inflates the effective shot count to reach a fixed precision by roughly eleven percent over the ideal-readout case.
The 2026 production libraries hide most of this behind a single primitive. Qiskit Machine Learning’s FidelityQuantumKernel with the ComputeUncompute fidelity backend computes the same overlap through a compute-uncompute trick that often saves an ancilla on NISQ hardware. PennyLane’s qml.kernels.kernel_matrix defaults to a Hilbert-Schmidt inner product that is mathematically equivalent for unit-norm states, and the same code path runs against Xanadu, IBM, IonQ, and AWS Braket simulators without changes. Both libraries also support the destructive swap test (the Bell-basis-measurement variant) when the data states can be discarded after measurement, which removes the controlled-SWAP entirely and halves the depth at the cost of losing the post-measurement state.
Worked example: k-means on six unit-circle points
The two diagrams below show the parallel between classical and quantum k-means clustering, the canonical worked example for kernel-methods QML. Both share the same Lloyd outer loop (alternate assignment and update steps until centroids stop moving); the quantum version replaces the classical distance computation with a swap test that decodes a probability into a squared cosine similarity. For the full end-to-end implementation including the 30-line Qiskit simulator and a step-by-step comparison of both paths, see our companion quantum k-means clustering walkthrough.
To make the abstract pipeline concrete, here is the toy worked example used throughout the walkthrough: six data points sitting on the unit circle in two classes, plus one test point. The whole point of the unit-circle data is that it lives natively as a quantum state, so amplitude encoding is essentially free.
Per-kernel-entry depth and shot budgets
Two practical numbers anchor a kernel-method deployment. The first is the per-kernel-entry circuit depth: typically ten to fifty two-qubit gates for a ZZ-feature-map encoder on twenty to thirty qubits, well within the coherence budget of any 2026 superconducting or trapped-ion machine. The second is the shot count: estimating a single overlap to two-decimal-place precision needs roughly 10,000 shots, so a 200-by-200 kernel matrix runs to about 400 million shots, which on a 5,000-shots-per-second IBM Heron-class machine is about a day of wall time. Those numbers are why kernel methods stay tractable on NISQ-era hardware: small circuits, large but parallelisable shot budgets, and a classical SVM that does not care if the kernel arrives in pieces.
Variational quantum circuits and quantum neural networks
The most-published family. A parameterised quantum circuit (PQC) takes encoded data and emits a measurement result that depends on its tunable angles. Variational quantum circuits map directly onto neural-network training: define a loss function, compute gradients via the parameter-shift rule or back-propagation, update the angles with a classical optimiser, repeat until convergence. Quantum neural networks are PQCs trained for classification or regression, and the growing testing-framework literature reflects how seriously the modality is now taken in production.
The mental model the quantum version inherits is the classical feedforward neural network: stacked linear-plus-nonlinear layers trained by backpropagation. Each hidden neuron computes y = σ(Wx + b) and the network learns by adjusting all the weights jointly to minimise a loss on the training data.
QML training reuses the same outer machinery, gradient descent on a loss landscape, but the loss landscape has a structural problem unique to quantum: barren plateaus. In a well-shaped classical landscape (left below) the gradient points down a clear slope and the optimiser converges quickly. In deep generic variational quantum circuits (right) the gradients average to near-zero across exponentially much of the parameter space, and descent makes no progress. Most contemporary VQC research is about ansatz choices that avoid this trap.
The trainability piece needs more depth, because it determines what is and is not feasible. The parameter-shift rule (Mitarai et al., Phys. Rev. A 2018) computes the gradient of an expectation value with respect to a single rotation angle exactly, by evaluating the same circuit at two shifted parameter values. This means VQCs are trainable with the same back-propagation discipline as classical neural networks, with no finite-difference approximation. The cost is two circuit evaluations per parameter, which scales linearly in parameter count but well: a hundred-parameter circuit takes two hundred forward passes per gradient step, and modern Qiskit Runtime sessions batch these to amortise calibration overhead.
Barren plateaus, formalised in McClean et al.’s 2018 Nature Communications paper, give the negative result: for a sufficiently deep parameterised circuit drawn from a 2-design, the variance of any cost-function gradient decays exponentially in the number of qubits. The practical reading is that random initialisations of generic deep ansatzes do not train.
The mitigations now considered standard in 2026 are problem-aware ansatzes (hardware-efficient circuits restricted to the connectivity graph), local cost functions (computed on a small subset of qubits to keep gradients addressable), layer-wise pre-training that brings the circuit close to a useful starting region before joint optimisation, and explicit symmetry constraints that confine the optimisation to a smaller manifold. The combination is what made variational quantum eigensolvers (VQE) for chemistry the first VQC family to ship as a production workload.
The other 2020-era architectural innovation worth knowing is data re-uploading, introduced by Pérez-Salinas, Cervera-Lierta, Gil-Fuster, and Latorre in the Quantum journal. Instead of injecting the classical input only once at the start of the circuit, the data is interleaved with trainable layers and re-encoded several times, which proves to be enough to make even a single qubit a universal classifier. The pattern is the quantum analogue of a feedforward network with residual connections, and most 2026 PennyLane and Qiskit-based VQC ansatzes now use either pure data re-uploading or a hybrid encoding-plus-re-uploading scheme.
A close cousin of VQE that has gained traction in 2024 to 2026 is Variational Quantum Imaginary Time Evolution (VQITE), introduced by McArdle, Yuan, Endo, and Benjamin in 2019. Where VQE searches the variational manifold for an energy minimum by gradient descent, VQITE simulates the non-unitary imaginary-time propagator exp(-Hτ) projected onto the manifold, which corresponds to a McLachlan-principle update of the parameters and often converges to better minima than vanilla VQE on hard chemistry instances. The technique is the basis of the 2026 Quantinuum and IBM chemistry pipelines for difficult intermediate-sized molecules where the VQE landscape stalls or under-converges, and it is the standard fallback when the ansatz hits a stubborn local minimum.
Quantum generative models
The fastest-growing family. Quantum Generative Adversarial Networks (QGANs), quantum Boltzmann machines, and quantum-classical hybrid generative models use quantum circuits to sample from learned distributions. Quantum generative advantage demonstrations have shown that QGANs can sample from distributions that classical models struggle with on small but instructive examples, and the entangled-circuit performance enhancement work has begun closing the gap to classical GAN baselines on real-world data.
The architecture is a direct port of the classical GAN: a generator (a parameterised quantum circuit) produces samples from a target distribution, and a discriminator (either classical or quantum) tries to distinguish them from real samples drawn from training data. Both networks train against each other in a minimax game until the generator’s distribution matches the data distribution closely enough that the discriminator cannot tell the difference. Lloyd and Weedbrook’s 2018 quantum-GAN proposal is the canonical reference, and the published 2026 implementations on superconducting hardware are getting useful Wasserstein-distance numbers on small but realistic financial time-series and medical-imaging benchmarks.
The third major generative architecture is the Quantum Circuit Born Machine (QCBM), introduced by Benedetti et al. in 2019. A QCBM is a parameterised quantum circuit whose output distribution (the Born-rule probabilities of computational-basis measurements) is trained directly against a target distribution using a maximum mean discrepancy or Kullback-Leibler loss, with no separate discriminator network. QCBMs are simpler to train than QGANs because they sidestep the minimax instability, and they are the dominant generative-model choice in 2026 for portfolio-distribution sampling, quantum-finance Monte Carlo, and synthetic-data generation in regulated domains.
Quantum reinforcement learning
The most experimental family. Quantum policy networks, quantum Q-learning, and projected entangled pair states (PEPS) for reasoning-trace coherence are the dominant approaches. Quantum-RL-inspired PEPS for LLM reasoning is the most-cited recent application, and the modality is at the same point trapped-ion was in 2018: theoretically interesting, practically embryonic, but moving fast.
The why-it-might-work argument is that classical reinforcement learning is dominated by exploration cost: an agent has to try many actions in many states to estimate a value function. Quantum amplitude amplification gives a quadratic speedup for searching the action space, and quantum walks accelerate Markov-decision-process exploration in structured environments. The why-it-is-hard argument is that real RL workloads run on environments orders of magnitude larger than current QPUs can hold, and the most realistic 2026 deployments are quantum-inspired classical algorithms that borrow the structure but execute on GPU clusters.
The two dominant paradigms (variational and kernel-based) sit on the same encoder but use the quantum machine differently: variational training treats the circuit itself as the model and learns its parameters end-to-end with parameter-shift gradients; kernel-based QML treats the circuit as a fixed feature map and offloads training to a convex classical SVM. The trade-off is expressivity versus training stability: variational circuits can in principle approximate richer functions, but kernel methods avoid the barren-plateau trap and converge convexly, which is why kernel methods dominate the production deployments at JPMorgan, HSBC, and most other 2026 finance-sector buyers.
The four families split by what the quantum machine does in the inner loop: compute kernel values (most-deployed), train a parameterised circuit (most-published), sample from a learned distribution (fastest-growing), or explore an environment (most experimental). The practical 2026 default is kernel-first for tabular work, variational for chemistry, generative for finance Monte Carlo, and a deliberate skip on quantum reinforcement learning until the action spaces shrink.
Seven more QML families nobody talks about
The four-families taxonomy above (kernel, variational, generative, reinforcement) is the standard textbook split, but it leaves out a handful of QML approaches that have meaningful 2026 research traction and that you will hit if you go deeper than the introductory literature. Knowing they exist is the difference between sounding like a beginner and sounding like a practitioner who has read the conference proceedings.
A randomly-initialised, untrained quantum circuit acts as a fixed feature map; only a classical readout layer is trained. Cheap, fast, and surprisingly competitive on small benchmarks where the random embedding happens to be informative.
A continuously-driven quantum system (an analog QPU like a Pasqal neutral-atom array) feeds its time-series output into a classical readout. The dominant 2026 use is sequence prediction on physiological and financial time-series data.
Generalisation of classical boosting to weak quantum learners. Theoretical interest more than production deployment in 2026, but the underlying weak-learner-aggregation pattern is genuinely useful in noisy-NISQ regimes.
QML algorithms restricted to the Clifford-circuit subset, which means the circuits are classically simulable but the algorithm-design discipline is identical to general QML. Useful for prototyping at scale before moving to non-Clifford hardware.
Compute the standard quantum kernel, then project it onto a polynomial classical feature space. Closes some of the exponential-concentration gap that pure quantum kernels hit at large qubit counts.
Compositional distributional natural-language processing built on tensor-product grammars. Quantinuum’s open-source lambeq library compiles sentences into quantum circuits whose measurement outcomes encode meaning, and the 2024 to 2026 commercial deployments target semantic search and patent prior-art retrieval.
Generalisations of message-passing graph neural networks to parameterised quantum circuits, with the input graph’s connectivity setting the entanglement pattern of the ansatz. Active 2024 to 2026 research at NVIDIA, IBM, and Caltech; the canonical use cases are small-molecule property prediction and citation-network classification.
How each QML algorithm actually works (101)
The QML literature names dozens of algorithms but only a small core ships as production-ready in 2026. The eight below are the canonical mechanics every practitioner should know, in the same order a 2026 IBM, Quantinuum, or Xanadu applications engineer would walk a new hire through them. Each subsection follows the same shape: what the algorithm solves, how it works step by step, and where it actually ships in 2026.
Quantum kernel SVM (QSVM)
What it solves. Classification problems where the data classes are not linearly separable in the raw feature space. QSVM extends the classical support vector machine by computing the kernel matrix on a quantum computer rather than via a classical kernel function, which opens the door to feature-space embeddings that no polynomial classical kernel can match.
How it works. The algorithm is a four-step hybrid pipeline that swaps the classical kernel function for a quantum-circuit-based one. The quantum part is bounded to step 2 below; the classical SVM does the actual training and prediction without knowing it is reading quantum-derived numbers.
- Pick a quantum encoder (the ZZ feature map is the production default) that maps each data point x to a quantum state |φ(x)〉.
- For every pair (x_i, x_j) in the training set, compute the squared overlap K(x_i, x_j) = |〈φ(x_i)|φ(x_j)〉|² using the swap test or a compute-uncompute fidelity primitive; the assembled matrix is the quantum kernel.
- Pass the kernel matrix to a classical SVM (sklearn’s SVC with kernel=”precomputed”) which finds the maximum-margin hyperplane in the implicit Hilbert space.
- For each new test point, compute its kernel value against every training point and let the SVM predict.
2026 status. QSVM is the most-deployed QML algorithm in production, anchoring the credit-risk-scoring and fraud-detection pipelines at JPMorgan Chase, HSBC, and several Wells Fargo IBM-Network publications. The Qiskit FidelityQuantumKernel plus sklearn SVC combination shown in the Working Code section is the canonical reference implementation.
Variational Quantum Classifier (VQC)
What it solves. The same classification problem as QSVM, but treats the parameterised quantum circuit itself as the model rather than as a kernel evaluator. The advantage is more expressivity per qubit; the cost is exposure to the barren-plateau trap during training.
How it works. Training a VQC is structurally identical to training a classical neural network, with the parameter-shift rule replacing back-propagation as the gradient mechanism. Every iteration evaluates the circuit twice per parameter to get an exact analytical gradient, then applies a classical optimiser update.
- Choose an encoder (angle embedding for shallow circuits, amplitude or feature-map encoding for richer representations) that loads x into qubits.
- Define a parameterised ansatz U(θ) (strongly-entangling layers, hardware-efficient ansatz, or problem-aware ansatz) with trainable angles θ.
- Measure an observable (typically Pauli-Z on the first qubit) to get a real-valued prediction in [-1, +1].
- Compute a loss (mean squared error or cross-entropy) between predictions and labels on a training batch.
- Compute gradients of the loss with respect to θ using the parameter-shift rule (two circuit evaluations per parameter).
- Update θ with Adam, SGD, or quantum natural gradient; repeat until the loss converges.
2026 status. VQC is the most-published QML algorithm and the standard research workload at IBM, Quantinuum, IonQ, and the major academic groups. Production deployments are growing in image classification (medical-imaging segmentation, defect-detection on small datasets) where shallow VQCs can match or beat small classical convolutional networks with fewer parameters.
Variational Quantum Eigensolver (VQE)
What it solves. Finds the ground-state energy of a molecular or material Hamiltonian H. VQE is the canonical quantum-chemistry algorithm and the first variational quantum method to ship as a production workload at IBM, Quantinuum, and Pasqal.
How it works. VQE applies the variational principle from quantum mechanics: any trial state’s expectation value of H is an upper bound on the ground-state energy, so minimising the expectation value over a parameterised ansatz drives it toward the true ground state. The classical optimiser handles the search while the quantum machine evaluates the energy at each iterate.
- Use Hartree-Fock or a similar classical method to compute a reference state and produce the qubit Hamiltonian via the Jordan-Wigner or Bravyi-Kitaev mapping.
- Define a parameterised ansatz |ψ(θ)〉 (Unitary Coupled Cluster for chemistry, hardware-efficient ansatz for noisy hardware).
- For each Pauli string in the decomposed Hamiltonian, run the circuit and measure to estimate that string’s expectation value.
- Sum the weighted Pauli expectations to get 〈ψ(θ)|H|ψ(θ)〉.
- Pass the energy estimate to a classical optimiser; update θ to lower it.
- Repeat until the energy stops decreasing; the final state approximates the ground state.
2026 status. VQE is the production QML workload for small to intermediate molecules at Quantinuum, IBM, and Pasqal, and it anchors the Cleveland Clinic, RIKEN, and IBM 12,635-atom protein simulation milestone from May 2026. ADAPT-VQE (adaptive ansatz growth) and VQITE (imaginary-time evolution) are the two standard improvements in 2026 production pipelines.
QAOA (Quantum Approximate Optimisation Algorithm)
What it solves. Combinatorial-optimisation problems like MaxCut, portfolio selection, vehicle routing, and any constraint-satisfaction problem with an Ising-form cost function. QAOA produces an approximate optimum after p alternating layers of problem and mixer Hamiltonians, with quality improving (asymptotically) as p grows.
How it works. QAOA can be read as a discrete-step Trotterised approximation of quantum annealing, with two angles per layer that a classical optimiser tunes. The final-state distribution concentrates probability mass on bit strings with low cost, and sampling many shots returns the best solution found.
- Encode the cost function as a problem Hamiltonian H_C that is diagonal in the computational basis (each bit-string assignment maps to an eigenvalue equal to its cost).
- Define a mixer Hamiltonian H_M (the sum of Pauli-X across all qubits is the standard choice).
- Prepare the equal-superposition state as the initial state.
- Apply p alternating layers of exp(-iγ_k H_C) and exp(-iβ_k H_M) for k = 1..p.
- Measure the final state in the computational basis to sample a candidate solution.
- Classical outer loop optimises the angles (γ, β) to minimise the expected cost; after convergence, sample many times and return the best bit string found.
2026 status. QAOA is the dominant QML workload on neutral-atom QPUs (QuEra, Pasqal, Atom Computing) where 1,000+ qubit registers and native Rydberg-blockade interactions encode combinatorial problems directly without a gate-overhead penalty. The BMW Group and Volkswagen production pipelines for metal-forming optimisation and traffic-flow control are anchored on this approach.
Quantum Neural Network (QNN)
What it solves. The same classification and regression problems as VQC, but with deeper, more expressive ansatzes inspired by classical deep neural networks. The QNN bet is that more layers approximate richer functions, mirroring the depth-equals-capacity argument from classical deep learning.
How it works. Training a QNN reuses the VQC machinery (parameter-shift gradients plus a classical optimiser) but with deeper architectures that interleave entangling layers, single-qubit rotations, and re-uploaded data. The depth advantage is real but exposes the model to the barren-plateau trap, so 2026 production QNNs use problem-aware ansatzes and local cost functions to keep gradients addressable.
- Choose an encoder that loads x into qubits (typically angle or feature-map encoding).
- Stack multiple layers of single-qubit rotations parameterised by θ and entangling gates (CNOT or CZ); inject the encoded data again between layers (data re-uploading).
- Measure one or more observables to produce the model output (a single Pauli-Z for binary classification, multiple observables for multi-class).
- Compute a local cost function (computed on a small qubit subset) to keep gradients addressable in deeper architectures.
- Compute parameter-shift gradients and update θ with Adam or quantum natural gradient.
- Repeat for many epochs; monitor for vanishing gradients and switch to layerwise pre-training if the loss stalls.
2026 status. QNNs are the most-active research target in QML and the dominant workload in academic-track publications. The Quantinuum and IBM applications-engineering teams ship pretrained meta-learners for QNN initialisation as a standard pipeline component, sidestepping the barren-plateau cold start that historically killed naive deep-circuit experiments.
Quantum GAN (QGAN)
What it solves. Sampling from a learned probability distribution. QGANs generate synthetic data that mimics a target distribution, with applications in financial Monte Carlo, medical-imaging augmentation, and synthetic-data generation for regulated domains.
How it works. QGAN is a direct port of the classical GAN architecture, with a generator and a discriminator playing a minimax game. The generator is always a parameterised quantum circuit; the discriminator is typically classical for efficiency but can also be a quantum classifier.
- Generator: define a parameterised quantum circuit G(θ_G) that takes latent qubits in a random initial state and produces samples by measurement.
- Discriminator: define a classifier D(φ) (classical neural network or quantum kernel) that scores each sample as real or generated.
- Train both networks adversarially: the discriminator maximises its accuracy at distinguishing samples; the generator minimises the discriminator’s accuracy.
- Alternate gradient updates: the discriminator is trained with classical SGD against real-versus-generated batches; the generator is trained with parameter-shift gradients against the discriminator’s negative output.
- After convergence, the generator’s output distribution matches the training distribution closely enough that the discriminator cannot reliably distinguish.
- Sample by running the trained generator circuit and measuring.
2026 status. QGANs are the second-largest generative workload after QCBMs and ship in production at JPMorgan and IBM-Network finance pipelines for derivative-pricing Monte Carlo and risk-distribution generation. The 2025 implementations on IBM Heron R2 have closed most of the gap to classical GANs on small financial-time-series and medical-imaging benchmarks.
Quantum Circuit Born Machine (QCBM)
What it solves. The same generative-modelling task as QGAN, but without the minimax instability that makes QGAN training fragile. QCBMs are the dominant 2026 quantum generative model for portfolio-distribution sampling and Monte Carlo risk simulation.
How it works. A QCBM treats the parameterised circuit as a probability distribution directly: the Born-rule probabilities of computational-basis measurements are the model output. Training minimises a divergence between the model and target distributions without needing a discriminator network.
- Define a parameterised circuit |ψ(θ)〉 that produces output distribution P_θ(x) = |〈x|ψ(θ)〉|² over computational-basis states |x〉.
- Sample by running the circuit and measuring in the computational basis; the empirical sample distribution is the model output.
- Compute a divergence (maximum mean discrepancy with a Gaussian kernel, or Kullback-Leibler divergence) between the empirical samples and the target distribution.
- Compute parameter-shift gradients of the divergence with respect to θ.
- Update θ with a classical optimiser (Adam or SGD); repeat for many epochs.
- After convergence, sample by running the trained circuit and measuring.
2026 status. QCBMs are the production generative-model choice for Multiverse Computing’s quantum-finance pipelines and the Zapata AI orchestration platform, both anchored on IBM and IonQ backends. The Benedetti et al. 2019 npj Quantum Information paper remains the canonical architectural reference for the family.
HHL (Harrow-Hassidim-Lloyd)
What it solves. Linear systems Ax = b where A is a sparse, well-conditioned matrix. HHL produces a quantum state proportional to A⊇⁻¹b in time poly(log N), exponentially faster than any classical algorithm that reads all of A and b under the same input model.
How it works. HHL chains three quantum primitives: quantum phase estimation to expose A’s eigenvalues, a conditional rotation to invert them, and uncomputation to undo the phase estimation cleanly. The fine-print catch (which the Aaronson cautionary essay made famous) is that the output is a quantum state, not a classical vector, so you can only extract a single observable per run.
- Prepare the quantum state |b〉 whose amplitudes are the normalised entries of b.
- Apply quantum phase estimation against A to write each eigenvalue λ_j into an ancilla register; the joint state is now a superposition over (|u_j〉, |λ_j〉) pairs where |u_j〉 are A’s eigenvectors.
- Apply a conditional rotation on a flag qubit that loads amplitude 1/λ_j onto each eigenvalue branch.
- Uncompute the phase estimation to disentangle the eigenvalue register from the state register.
- Measure the flag qubit; conditioned on the success outcome, the remaining state has amplitudes proportional to A⊇⁻¹b.
- Read out the single observable of interest (e.g., x_i for a particular index, or x^T M x for a precomputed M).
2026 status. HHL is the historical foundation of HHL-derived QML (PCA, recommendation systems, regression) and remains useful in quantum-native pipelines where the input b is already a quantum state. The Tang 2018 dequantisation result killed most of its claimed advantage on classical sampling-access data, but the 2024-2026 fault-tolerance roadmap may revive specific HHL use cases on early-FTQC machines.
How data gets into a QML circuit
Data encoding is the foundational engineering question. A quantum circuit only knows how to operate on quantum states, so classical data has to be embedded into a Hilbert space first. The four standard QML data encoding schemes are basis encoding (each classical bit becomes one qubit), amplitude encoding (a normalised d-dimensional vector becomes the amplitudes of a log₂(d)-qubit state), angle encoding (each feature becomes a rotation angle), and feature-map encoding (a deliberately non-trivial circuit that produces a Hilbert-space embedding designed for the downstream task).
The choice matters enormously. Amplitude encoding is theoretically beautiful (you can fit a 1024-dimensional vector into 10 qubits) but is extremely hard to implement on real hardware because state preparation requires deep circuits. Angle encoding is easy to implement but does not scale to high-dimensional data without a lot of qubits. Feature-map encoding with multi-qubit entangling gates is what gives quantum kernel methods their genuine quantum-vs-classical performance gap, and the choice of feature map (ZZ-feature map, IQP-style, hardware-efficient ansatz) is where the engineering really happens.
Schuld and Killoran’s 2019 Phys. Rev. Lett. paper on quantum machine learning in feature Hilbert spaces is the cleanest formal account of why encoding is everything. The choice of feature map determines the geometry of the Hilbert-space embedding, and the geometry of the embedding determines whether the resulting kernel can express the structure in the data.
A bad feature map produces a quantum kernel that is provably equivalent to a classical Gaussian kernel and gives no advantage; a good feature map produces a kernel that no classical computer can compute efficiently, which is the entire point of running the workload on a quantum machine in the first place. The diagram below shows three loading schemes side by side for a four-feature input vector, with the qubit and circuit-depth costs of each.
Six encodings compared: qubits, depth, classical simulability
The right way to choose an encoder is to weigh four things at once: how many qubits the scheme costs you for an N-feature input, how deep the state-preparation circuit is on real hardware, whether the resulting feature map is provably hard for a classical computer to evaluate, and how tolerant the encoding is to gate noise. The table below is the side-by-side cheatsheet.
| Encoding | Qubits for N features | State-prep depth | Classically simulable? | Best for |
|---|---|---|---|---|
| Basis | N | O(1) (just X gates) | Yes (product state) | Toy examples; binary inputs only. |
| Amplitude | log₂(N) | O(N) (general state prep) | Sometimes | Quantum-native data where log compression actually pays off. |
| Angle | N | O(1) (single rotations) | Yes (product state, no entanglement) | Hardware-efficient prototyping; no claim of advantage. |
| Phase | N | O(1) (single phase rotations) | Yes (product state) | Same as angle, with phase rather than rotation parameter. |
| Dense angle | N/2 | O(1) | Yes (still product state) | Halves the qubit cost of angle encoding by packing two features per qubit. |
| ZZ feature map (or Pauli FM) | N | O(N) at depth-2 reps | No (entangling) | Where genuine quantum-vs-classical advantage shows up; the production default for kernel methods. |
The “classically simulable” column is the load-bearing one for any quantum-advantage claim. Encodings that produce only product states (basis, angle, phase, dense-angle) can be evaluated by any classical computer in polynomial time, which means a kernel built on them gives no quantum advantage no matter what the downstream classifier looks like. ZZ-style entangling feature maps are the smallest step that crosses into provably-hard-to-simulate territory, which is why almost every credible 2026 production QML deployment uses them or a close variant.
The encoder is the single most consequential decision in any QML project, because it determines whether the resulting kernel or feature map can express the structure in the data at all. Encodings that produce product states (basis, angle, phase) give no quantum advantage; entangling feature maps like ZZ-style or IQP-style are the smallest step that crosses into provably-hard-to-simulate territory.
Quantum data engineering before encoding
Between the raw dataset and the quantum encoder sits a layer of classical preprocessing that no QML deployment can skip, and that is typically where the most engineering hours go. Feature normalisation is the first step: amplitude encoding demands a unit-norm input vector, angle encoding wants features rescaled to the [0, 2π] range to avoid wrapping degeneracies, and ZZ-style feature maps perform best when each feature sits in a bounded range around zero so the entangling gates produce non-trivial relative phases. Skipping this step is the single most common reason a textbook QML algorithm runs cleanly on toy data but loses to a classical baseline on real data.
Dimensionality reduction matters even more for amplitude-encoded workloads. A 1024-dimensional vector packs into 10 qubits in principle, but the deep state-preparation circuit makes the encoder so expensive that the practical sweet spot is sixteen to sixty-four features after PCA, autoencoder, or random-projection compression. The PennyLane Datasets module ships pre-prepared compressed feature versions of the standard QML benchmarks for exactly this reason, and Qiskit Machine Learning includes a sklearn-compatible preprocessing pipeline that handles encoder-specific scaling automatically.
Encoding-aware feature selection is the under-discussed third step. The features that contribute most to a classical SVM are not necessarily the ones a quantum kernel can exploit, because the quantum feature map projects them into a Hilbert-space geometry that classical-importance metrics do not anticipate. The 2025 IBM and Quantinuum applications-engineering teams now ship encoder-validation scripts that swap features in and out and measure the resulting quantum-kernel rank, which is a far better predictor of downstream classifier accuracy than any classical feature-importance method.
Quantum machine learning libraries you actually use
Six software stacks dominate the QML landscape in 2026. The right choice depends on which hardware backend you plan to target, which classical machine-learning ecosystem your team already uses, and how much vendor abstraction you want sitting between your code and the QPU.
PennyLane (Xanadu)
PennyLane is the dominant pure-play QML library, built around an automatic-differentiation engine that integrates natively with PyTorch, TensorFlow, and JAX. The PennyLane tutorials and demos are the canonical learning resources, and the PennyLane MPI integration with exascale Frontier made it the first QML library to run at exascale. Backend-agnostic: the same circuit code runs on Xanadu photonic, IBM superconducting, IonQ trapped-ion, and AWS Braket simulators.
Qiskit Machine Learning (IBM)
Qiskit Machine Learning is the IBM-first-party QML toolkit built on Qiskit Runtime. The library exposes QSVMs, quantum neural networks, and quantum kernel methods as scikit-learn-compatible estimators, which means existing classical ML pipelines can swap in quantum components with minimal changes. Runs natively on IBM Quantum Platform hardware via Qiskit Runtime sessions, which batch many circuit executions to amortise calibration overhead and keep per-shot cost low.
The 2026 strength is the depth of the stack. Qiskit Runtime ships built-in error mitigation through Estimator primitives (zero-noise extrapolation, probabilistic error cancellation, dynamical decoupling), all addressable through a single resilience-level setting. The standard production pattern for IBM-hosted QML in 2026 is a Session opened against the Heron R2 backend, an Estimator with resilience level 2, and a scikit-learn pipeline that calls the quantum kernel through a custom kernel function. Most JPMorgan, HSBC, and Wells Fargo IBM-Network publications use exactly this pattern.
TensorFlow Quantum (Google)
TensorFlow Quantum integrates Google’s Cirq quantum-circuit library with TensorFlow’s neural-network training infrastructure. The strength is the tight integration with classical TensorFlow models for hybrid quantum-classical neural networks; the weakness is the smaller community than PennyLane and the lack of first-party access to Google’s Sycamore and Willow hardware (which Google operates internally rather than as a public cloud). Best suited to teams already on the TensorFlow stack who plan to run on simulators or on third-party QPUs through TFQ’s interop adapters.
Classiq (multi-backend compilation)
Classiq is the quantum-software platform that compiles high-level functional specifications into vendor-optimised circuits across IBM, IonQ, Rigetti, Quantinuum, Pasqal, AWS Braket, OQC, QuEra, AQT, and Alice & Bob backends. For QML workloads where the same algorithm needs to run on multiple QPU architectures (multi-cloud benchmarking is the standard 2026 pattern), Classiq abstracts away the per-vendor optimisation work, which can shave 30 to 60 percent off the gate count compared with hand-written transpilation. The HSBC strategic investment in Classiq’s Series B funding round in 2024 brought finance-grade compliance and audit support into the platform.
Pasqal Pulser and Quandela Perceval
Pasqal Pulser is the open-source library for analog-mode neutral-atom programming, where the user describes the laser-pulse sequence and atom register geometry directly rather than expressing the algorithm as a digital gate sequence. Pulser is what makes Hamiltonian-encoded QML on Pasqal hardware tractable, and it is the canonical entry point for any neutral-atom QML workload. Quandela Perceval is the photonic-circuit programming framework, with first-party simulators and access to Quandela’s photonic QPUs through the cloud. Both libraries are essential when the algorithm needs hardware-native primitives, and both run in PennyLane and Qiskit through interop plugins.
OpenFermion and PySCF
Quantum-chemistry libraries that integrate with PennyLane and Qiskit for variational quantum eigensolver (VQE) workloads. OpenFermion handles the second-quantisation transformations (Jordan-Wigner, Bravyi-Kitaev) that map fermionic Hamiltonians onto qubits; PySCF computes the underlying classical electronic-structure quantities (Hartree-Fock orbitals, integrals) that the quantum eigensolver refines. Most published QML chemistry results in 2026 use this stack with a PennyLane or Qiskit Runtime back-end, the IBM-Pasqal supercomputing partnership, or the Microsoft Azure Quantum Elements integration.
cuQuantum and CUDA-Q (NVIDIA)
NVIDIA’s cuQuantum SDK and CUDA-Q programming model are the incumbent GPU-accelerated quantum-circuit simulators in 2026. PennyLane Lightning, Qiskit Aer, and TensorFlow Quantum all expose CUDA-Q backends, which means a workload that fits in 35 to 40 qubits of state-vector simulation can run on an NVIDIA H200 cluster in seconds where it would take hours on a CPU. For QML training loops where the inner-loop circuit needs to be evaluated thousands of times per epoch, the GPU-accelerated simulator is what makes the workflow tractable before moving to real hardware for the final validation runs.
Startups beyond the big vendors
Beyond the IBM, Google, IonQ, Quantinuum, Xanadu, and Pasqal-class platforms, a layer of smaller vendors ships QML-relevant tooling and applications in 2026. The list below is the working shortlist for technology buyers evaluating the broader ecosystem; it changes faster than the platform-vendor list does because the smaller companies pivot more aggressively as underlying hardware capability shifts.
Spanish-Canadian QML and quantum-inspired AI specialist. CompactifAI platform for tensor-network LLM compression is the commercial flagship; quantum-finance and chemistry pipelines run on IBM, IonQ, and Pasqal backends.
US quantum-software platform. Forge SDK exposes QML primitives (Q-means, quantum SVMs, QAOA) as managed APIs across IBM, IonQ, and AWS Braket; finance and pharma are the strongest customer verticals.
Originally a Harvard Aspuru-Guzik spinout; pivoted in 2024 to 2025 from pure quantum-software into hybrid generative AI with quantum-circuit components. Orquestra orchestration ships QML pipelines alongside classical generative-AI ones.
Quantum-developer-tooling startup; web-based GPU-accelerated simulator plus access to IBM and IonQ. Sponsors the $20,000 Bitcoin Quantum Advantage Challenge that anchors the 2026 community competitions list.
Australian diamond-nitrogen-vacancy hardware vendor. Targets room-temperature embedded QPU modules for edge QML and federated-learning deployments where cryo-cooled superconducting machines are impractical.
Sydney-based quantum-control and error-suppression specialist. Fire Opal and Boulder Opal pipelines ship native QML circuit-optimisation passes that compose with PennyLane and Qiskit, with 2025 published gains of three to ten times in effective circuit depth.
Finnish quantum-algorithms specialist focused on chemistry, drug discovery, and noise-aware variational methods. Production QML chemistry pipelines deployed with Pfizer, Boehringer Ingelheim, and other named pharma customers in 2024 to 2025.
Singapore-based quantum-compiler company. Triple Alpha compilation platform produces hardware-optimised circuits for QML workloads with vendor-agnostic frontends, competing directly with Classiq in the multi-cloud compilation segment.
Working code: a quantum classifier in 30 lines
The fastest way to internalise the quantum machine learning workflow is to read working code. The two snippets below build the same variational quantum classifier on the same toy dataset (the scikit-learn moons benchmark, two interleaving half-circles), once in PennyLane and once in Qiskit Machine Learning, so you can see the parallel between the two dominant libraries.
PennyLane variational classifier (28 lines)
import pennylane as qml
from pennylane import numpy as np
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
X, y = make_moons(n_samples=200, noise=0.15, random_state=0)
X = StandardScaler().fit_transform(X)
y = 2 * y - 1 # map labels to {-1, +1}
n_qubits = 2
dev = qml.device("default.qubit", wires=n_qubits)
@qml.qnode(dev, interface="autograd")
def circuit(weights, x):
qml.AngleEmbedding(x, wires=range(n_qubits))
qml.StronglyEntanglingLayers(weights, wires=range(n_qubits))
return qml.expval(qml.PauliZ(0))
def cost(weights, X, y):
preds = np.array([circuit(weights, x) for x in X])
return np.mean((preds - y) ** 2)
shape = qml.StronglyEntanglingLayers.shape(n_layers=3, n_wires=n_qubits)
weights = np.random.uniform(0, 2 * np.pi, shape, requires_grad=True)
opt = qml.AdamOptimizer(stepsize=0.05)
for epoch in range(50):
weights = opt.step(lambda w: cost(w, X, y), weights)
if epoch % 10 == 0:
print(f"epoch {epoch}: loss = {cost(weights, X, y):.4f}")
The data flow is exactly the four-stage QML pipeline from earlier in this article: AngleEmbedding is the encoder, StronglyEntanglingLayers is the trainable ansatz, qml.expval(qml.PauliZ(0)) is the measurement, and the Adam optimiser is the classical outer loop that updates the weights. Run this on a laptop and the loss drops from 1.0 at epoch 0 to under 0.3 by epoch 50, recovering the standard moons-classifier accuracy of 90 percent or so.
Qiskit Machine Learning quantum kernel SVM (24 lines)
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from qiskit.circuit.library import zz_feature_map
from qiskit_machine_learning.kernels import FidelityQuantumKernel
from qiskit_machine_learning.state_fidelities import ComputeUncompute
from qiskit.primitives import StatevectorSampler
X, y = make_moons(n_samples=200, noise=0.15, random_state=0)
X = StandardScaler().fit_transform(X)
feature_map = zz_feature_map(feature_dimension=2, reps=2, entanglement="linear")
sampler = StatevectorSampler()
fidelity = ComputeUncompute(sampler=sampler)
kernel = FidelityQuantumKernel(feature_map=feature_map, fidelity=fidelity)
X_train, y_train = X[:160], y[:160]
X_test, y_test = X[160:], y[160:]
K_train = kernel.evaluate(x_vec=X_train)
K_test = kernel.evaluate(x_vec=X_test, y_vec=X_train)
svc = SVC(kernel="precomputed").fit(K_train, y_train)
preds = svc.predict(K_test)
print(f"test accuracy = {accuracy_score(y_test, preds):.3f}")
This is the kernel-method version of the same workflow. ZZFeatureMap is the encoder; FidelityQuantumKernel computes the kernel matrix using a swap-test-style state-fidelity primitive; the classical scikit-learn SVC handles the actual training. Both snippets are deliberately small enough to fit in your head and concrete enough to extend: swap the encoder, change the ansatz depth, point at IBM Quantum Platform hardware with a one-line backend change.
When quantum machine learning beats classical machine learning
The quantum-advantage question is the hardest in QML, and the honest 2026 answer is that the empirical picture is mixed. There are concrete cases where the modality can demonstrably outperform classical baselines, and there are large empirical regimes where it does not.
Where QML genuinely wins
Three places where QML has shown real advantage. First, on quantum-native data (chemistry samples, sensor outputs, simulation snapshots) where the data is naturally a quantum state and amplitude encoding becomes free. Second, with carefully-designed feature maps that introduce entanglement (ZZ-feature maps, IQP-style encoders) where the resulting Hilbert-space embedding is provably hard for classical kernels to reproduce. Third, with classical-shadows-style measurement protocols that estimate many quantities simultaneously from a single set of randomised measurements, reducing the effective shot budget by orders of magnitude.
Where classical baselines still dominate
Three places where classical baselines remain stubbornly competitive. First, on classical tabular data with moderate dimension where classical SVMs, gradient-boosted trees, and small neural networks dominate the empirical leaderboard. Second, on small datasets where the per-shot cost of quantum hardware overwhelms any algorithmic advantage. Third, on problems where the dequantisation literature has produced classical algorithms that match or beat the quantum approach. NISQ-era QSVM generalisation bounds capture the current academic consensus on where quantum kernels do and do not produce advantage, and the picture is sharpening rather than collapsing.
The Huang and Liu-Arunachalam-Temme papers
Two specific results define the modern empirical picture. Huang et al.’s 2022 power of data Nature Physics paper proves that the relevant question is not whether a quantum algorithm beats a classical algorithm in isolation, but whether it beats the best classical algorithm that has access to the same training data. Many earlier exponential-speedup claims fail this stricter test. Liu, Arunachalam, and Temme’s 2021 Nature Physics paper on quantum-classical separation in supervised learning gives the positive counterpart: there are concrete classification problems (variants of the discrete logarithm problem) where a quantum kernel demonstrably learns where any classical kernel provably cannot. Those two papers, taken together, are the load-bearing intellectual scaffolding for everyone now deploying quantum kernel methods in 2026.
Dequantisation: the third pillar of the modern view
Aaronson and Tang’s “dequantisation” line of work is the third pillar of the modern view. Tang’s 2018 classical algorithm for recommendation systems matched the asymptotic runtime of the quantum HHL-based algorithm by Kerenidis and Prakash, demolishing what had been considered one of the strongest quantum machine-learning advantages. Subsequent dequantisation papers have done the same to many HHL-derived QML algorithms, and the standing rule for any new quantum-advantage claim is that it has to survive a sampling-access classical baseline (the dequantisation analog) before publication-quality status is granted.
QML demonstrably wins on quantum-native data, on problems with carefully-designed entangling feature maps, and on workloads where classical-shadows-style measurements can replace many separate observable estimations. It does not yet win on classical tabular data with moderate dimension, on small datasets where per-shot cost dominates, or on any problem with a known dequantisation result.
What QML can provably do, and the one provable wall
Most of the QML “advantage” debate runs on benchmarks and intuition. The three threads in this section are different: they are mathematical theorems and historical results about specific problem classes, and together they are the load-bearing intellectual scaffolding underneath every serious 2026 vendor pitch. Knowing all three is what separates a careful QML practitioner from a marketer.
HHL and the quantum linear-systems lineage
The 2009 Harrow-Hassidim-Lloyd algorithm (HHL) for solving sparse well-conditioned linear systems Ax = b in time polynomial in log(N) is the single most cited result in the history of quantum machine learning, and the founding theorem for a whole generation of “exponential speedup” claims. Quantum recommendation systems (Kerenidis and Prakash 2017), quantum principal component analysis (Lloyd, Mohseni, Rebentrost 2014), and the quantum regression line all sit downstream of HHL, because each one reduces its inner-loop bottleneck to solving a linear system that HHL handles exponentially faster than any then-known classical algorithm.
The 2018 to 2022 dequantisation programme changed the picture. Tang’s 2018 result on classical algorithms for recommendation systems, followed by Gilyén et al.’s 2018 work on classical sampling-access algorithms for low-rank matrices, showed that under the same input model HHL assumes (efficient quantum-state preparation of the input vector), classical sampling-based algorithms can match the polylogarithmic-in-N runtime. The exponential advantage that HHL appeared to give was therefore not really exponential at all; it was a careful and somewhat hidden assumption about how the input is loaded into the algorithm.
The modern reading of HHL in 2026 is that it remains a useful primitive when the input is naturally a quantum state (as in chemistry or quantum sensing) but offers no exponential advantage on classical data accessed through sampling oracles, which is the regime almost every commercial QML workload sits in. The cautionary lesson is structural and persists across the field: every claimed exponential speedup must be checked against the best classical algorithm with comparable input access, and the discrete-log result in the next subsection is one of the few QML advantages that survives this scrutiny.
The provable upside: the discrete-log kernel
Liu, Arunachalam, and Temme proved in 2021 that there exists a concrete supervised-learning problem (a classification task built on the discrete-logarithm function) where a quantum kernel learns the right decision boundary with polynomial sample complexity, while no classical kernel can do so unless it solves the discrete-logarithm problem in polynomial time. The latter is widely believed to be intractable (it is the security assumption behind much of pre-quantum public-key cryptography), so the result is the closest thing the QML field has to a hard quantum-advantage theorem.
The catch is that the discrete-log problem is contrived for the proof; nobody is going to deploy a credit-risk model on a discrete-logarithm dataset. The constructive value of the result is that it shows the quantum-classical separation is not vacuous: there is at least one well-defined regime where a quantum kernel beats every classical alternative on first principles. The open research programme since then has been to find natural-data analogues of this regime, which is what most of the kernel-method publications in 2024 to 2026 are chasing.
The provable wall: exponential concentration of quantum kernels
The negative-direction theorem comes from Thanasilp, Wang, Cerezo, and Holmes in 2024: for many natural quantum-kernel constructions, the kernel value between any two random data points concentrates exponentially around a constant as the qubit count grows. The practical effect is that on large quantum machines, every pair of points looks the same to the kernel, and the downstream classifier loses all signal.
This is structurally similar to the barren-plateau problem for variational circuits, but it is kernel-specific and shows up at the data-encoding step rather than during training. The 2025 to 2026 mitigation work centres on projected quantum kernels (which project the quantum kernel onto a small classical feature space before training) and on problem-aware feature maps that avoid the concentration regime by construction. Both directions are active enough that “exponential concentration” is the second technical question every kernel-method engineer has to answer in 2026, after barren plateaus.
Six common QML myths debunked
“Quantum machine learning is faster than classical ML for everything.” Reality: false. Quantum machine learning shows demonstrable advantage on a narrow set of problems (quantum-native data, structured kernel embeddings, certain optimisation tasks). On the vast majority of classical tabular and image data, gradient-boosted trees and deep neural networks remain faster and more accurate, and will for the foreseeable future.
“You can feed raw classical data straight into a QML algorithm.” Reality: false. Every QML algorithm needs an encoder that maps classical data into a quantum state, and the choice of encoder is the dominant engineering decision. A bad encoder collapses the quantum kernel to a classical equivalent and gives no advantage.
“More qubits always means better QML.” Reality: false. Beyond a problem-dependent threshold, deep generic circuits hit the barren-plateau regime where gradients vanish and training stalls. Modality, connectivity, and ansatz structure matter at least as much as raw qubit count.
“Quantum machine learning will replace deep learning.” Reality: false. The 2026 production pattern is hybrid: classical deep learning for the majority of the workload, QML for inner-loop steps where it beats the classical baseline. The two coexist, the same way GPUs and CPUs coexist inside a single training job.
“You need a physics PhD to start with QML.” Reality: false. The standard 2026 first project is a variational classifier on the moons dataset in PennyLane or Qiskit, runnable in 30 lines of Python on a laptop simulator (see the code section above). A solid Python plus introductory linear-algebra background is enough.
“Quantum AI is conscious / quantum AI is a step toward AGI.” Reality: false. Quantum machine learning is a numerical-acceleration technique. There is no theoretical or empirical link between running a kernel on a quantum machine and any cognitive property; serious researchers do not make this claim. Any vendor or media source that does is misrepresenting the field.
Mid-circuit measurement and dynamic circuits in QML
One of the most consequential 2025 to 2026 QML hardware capability shifts is the maturation of mid-circuit measurement and dynamic circuits across IBM, Quantinuum, IonQ, and QuEra. A mid-circuit measurement reads out a single qubit partway through the circuit without disturbing the others, and a dynamic circuit then conditions later gates on the classical bit that just came out. Together they unlock three QML patterns that the older measure-once-at-the-end model could not support efficiently.
The first pattern is ancilla reset, which is what makes the swap-test pipeline twice as qubit-efficient as the textbook version suggests. After the swap-test ancilla is measured, a mid-circuit measurement plus a conditional X-gate sets it back to |0〉 and frees it to compute the next kernel entry without allocating a fresh qubit. On a 156-qubit IBM Heron R2 the saving is the difference between running 70 simultaneous swap tests in parallel and running 35, which compounds across every entry of a large kernel matrix.
The second pattern is adaptive circuit construction, which matters most for variational quantum eigensolvers in chemistry. The ADAPT-VQE algorithm (Grimsley et al. 2019) chooses the next ansatz operator at run time based on the measured commutator norm; without mid-circuit measurement the choice has to be precomputed classically and the resulting circuit is much deeper. The 2026 Quantinuum and IBM chemistry pipelines built around Helios and Heron R2 increasingly use this adaptive pattern because the depth savings on the resulting ansatz are typically 30 to 60 percent.
The third pattern is randomised measurement for classical shadows, which reconstructs many observables of a quantum state from a small number of randomised mid-circuit measurements. Huang, Kueng, and Preskill’s 2020 Nature Physics paper is the canonical reference, and the technique is now the standard route to estimating fidelities and expectation values in 2026 QML benchmarking without paying the full per-observable shot cost. The practical effect on the swap-test pipeline is roughly a tenfold reduction in shot count for a fixed kernel-matrix accuracy, which is the difference between a one-day run and a one-hour run on most 2026 cloud QPUs.
Depth budgets engineers actually use
A QML circuit that looks great on a noise-free simulator will not run on real hardware unless it fits the depth budget the noise allows. The numbers below are the practitioner rules of thumb the IBM, IonQ, and Quantinuum applications-engineering teams actually use in 2026, and they are the difference between a textbook circuit and a deployable one.
• Under ~30 two-qubit gates: comfortable. Runs cleanly on IBM Heron R2 / IonQ Forte / Quantinuum H2 with default error-mitigation settings.
• ~100 to 200 two-qubit gates: realistic mitigation regime. Needs explicit zero-noise extrapolation or probabilistic error cancellation, plus dynamical decoupling and Pauli twirling.
• Beyond ~500 two-qubit gates: signal collapses below kernel-value precision. The circuit runs but the result is noise-dominated.
The error-compounding arithmetic
The intuition behind the numbers is the error-compounding arithmetic. A 0.3 percent two-qubit gate error rate (the 2024 median for IBM Heron R2 per IBM’s published layer-fidelity metric) compounds across the circuit: a 30-gate circuit succeeds in roughly 0.997^30 = 91 percent of shots, a 200-gate circuit in 0.997^200 = 55 percent, and a 500-gate circuit in 0.997^500 = 22 percent. Once the success probability falls below the kernel-value precision you need, the algorithm cannot tell signal from noise no matter how many shots you run.
Ansatz design under depth pressure
The practical corollary is that ansatz design is dominated by depth-budget arithmetic in 2026. Custom feature maps that achieve the same expressivity at much lower gate counts than the textbook ZZ-FM-at-depth-2-reps construction are routine in the production literature, and the 30-percent-to-60-percent gate-count reduction that Classiq and Qiskit transpiler optimisation buy you is the difference between a deployable and an undeployable workload.
Which hardware modality fits which QML algorithm
One of the most underappreciated production decisions in quantum machine learning is matching the algorithm to the hardware. The same QML workflow can run dramatically better or worse depending on which qubit modality you target, and the wrong pairing can wipe out an algorithmic advantage. Our top quantum cloud providers guide covers the vendor landscape; the table below is the algorithm-to-modality match.
Algorithm-to-modality matching
| QML algorithm class | Best-fit modality | Why |
|---|---|---|
| Variational classifiers (small qubit count, deep circuits) | Trapped ion (IonQ, Quantinuum) | 99.9% two-qubit fidelity and full all-to-all connectivity let deep ansatzes run without heavy SWAP overhead. |
| Quantum kernel methods (medium qubit count, shallow circuits, large shot budgets) | Superconducting (IBM Heron, Google Willow) | Fast clock cycles (microsecond gate times) keep the per-shot cost low for kernel matrices that need millions of shots. |
| Quantum k-means and unsupervised clustering | Trapped ion or superconducting | Same primitives as kernel methods; modality choice driven by shot economics and queue time on the target cloud. |
| QAOA-style optimisation (large qubit count, problem-specific Hamiltonian) | Neutral atom (QuEra, Pasqal, Atom Computing) | 1,000+ qubit registers and native Rydberg-blockade interactions encode combinatorial problems directly without gate overhead. |
| Quantum generative models (Gaussian boson sampling, continuous-variable QGANs) | Photonic (Xanadu, Quandela, ORCA) | Native continuous-variable encoding plus room-temperature operation; Borealis-class machines already at quantum-advantage scale for sampling. |
| Variational quantum eigensolvers for chemistry | Trapped ion or superconducting (vendor depends on molecule size) | Need clean two-qubit fidelity for the long ansatzes; trapped ion wins on small molecules, superconducting on larger systems with deeper transpilation. |
| Quantum reinforcement learning (experimental) | Simulator with GPU acceleration (cuQuantum) | The action space is too large for any current QPU; classical-quantum hybrid simulation is the only viable 2026 production path. |
Vendor lock-in and the modality choice
The most underappreciated cost of the modality choice is the vendor and ecosystem lock-in that follows. Picking trapped-ion-first means committing to the IonQ Quantum Cloud or Quantinuum Nexus stack, learning the H-Series and Forte calibration cadences, and writing your circuit for the all-to-all connectivity that does not exist on superconducting hardware. Switching modalities mid-project usually means rewriting the ansatz, retuning the encoder, and re-validating against new noise models, which in 2026 typically costs three to six engineer-months of work for a non-trivial workload.
The 2026 production reality is that most serious enterprise QML teams now bench-test on at least two modalities before committing to one. The standard cross-validation pattern is to develop the algorithm on a noiseless simulator, validate on IBM Heron R2 and IonQ Forte in parallel, and then commit production budget to whichever passes the wall-clock and shot-cost thresholds for the target workload. Multi-cloud platforms (Classiq, Amazon Braket, qBraid Lab, Azure Quantum) make this dual-tracking cheap enough that skipping it is a clear sign of an under-resourced QML programme.
How to measure a quantum machine learning model honestly
The benchmark question is easy to get wrong, and most early QML claims fell apart on closer inspection because the comparison was unfair. The 2026 baseline discipline is straightforward but usually skipped: every quantum result has to be reported against a classical baseline that has the same training data, the same evaluation protocol, and a comparably-tuned model.
Three numbers to track for kernel methods
Three numbers matter for a kernel-based quantum machine learning model. First, classification accuracy or AUROC against a classical SVM with an RBF kernel and a tuned bandwidth parameter, which is the model the quantum kernel is replacing. Second, total wall-clock time including state preparation, kernel evaluation, classical SVM training, and shot-budget overhead, against the same wall-clock total for the classical baseline including data preparation. Third, sample complexity (how many training points the model needs to reach a fixed error level), which is where genuine quantum advantage tends to show up first if it exists at all.
Three numbers to track for variational circuits
Three different numbers matter for a variational quantum machine learning model. First, training-loss convergence rate, which measures whether the optimiser is actually moving on the loss landscape (barren-plateau territory if it is not). Second, generalisation gap (training accuracy minus test accuracy), which measures whether the parameterised circuit is overfitting in ways that classical neural networks do not, an open empirical question in 2026. Third, the cost of a single forward pass on the target hardware, including queueing time on shared cloud QPUs, because production deployments live or die by the per-inference cost-per-prediction.
Two diagnostics every practitioner uses
Two diagnostic tools every serious QML practitioner uses in 2026. The first is classical simulation of the same circuit at small qubit counts, to confirm the algorithm is doing what it should before moving to real hardware. The second is the noisy-intermediate-scale-quantum (NISQ) error model from IBM Qiskit, IonQ Forte, or Quantinuum H-Series, layered on top of the simulator, to estimate how much the real-hardware results will degrade from the noise-free baseline. The gap between noise-free and noisy simulation is the budget the error-correction or error-mitigation layer has to close.
Every quantum result has to be reported against a classical baseline trained on the same data with the same evaluation protocol; without that comparison, the claim is unfalsifiable. The three numbers that matter for a kernel method are accuracy against a tuned classical SVM, total wall-clock including queue, and sample complexity; for a variational model the equivalents are convergence rate, generalisation gap, and per-inference cost.
Validating and testing QML pipelines
The software-engineering side of QML is where most production deployments either harden or fall apart. The 2026 minimum testing pattern has three layers. First, unit tests on the circuit itself: prepare a known state on a deterministic simulator backend, then check the expectation values of a chosen observable against the analytical result to within machine precision. Second, statistical equivalence checks between simulator and hardware: run the same circuit at a small qubit count on both, verify that the empirical distribution over outputs has a Kullback-Leibler divergence under a project-defined threshold (typically 0.05) once shot noise is accounted for.
The third layer is regression testing that pins shot counts, random seeds, and circuit transpilation hashes, so that small changes in the surrounding code are not masked by sampling variance. The standard tooling is Qiskit’s algorithm-globals plus pytest fixtures for the unit-test layer, the PennyLane qml.tracker module for cost-and-precision regression tracking, and the Mitiq library’s benchmarks module for cross-backend equivalence checks. The 2026 best practice on top of this is one continuous-integration job per backend (simulator, IBM, IonQ, Quantinuum) that runs nightly with a small fixed shot budget, so divergence appears as a pull-request comment rather than a production incident.
Explainable QML: Q-LIME and quantum Shapley values
Regulated industries (finance, healthcare, defence) cannot deploy any machine-learning model without an interpretability story, and quantum machine learning has had to grow one. The 2024 to 2026 explainable-QML literature has produced direct quantum analogs of the two dominant classical interpretability tools: LIME (local interpretable model-agnostic explanations) and Shapley values.
Q-LIME for local feature attribution
Q-LIME, introduced by Heese et al. in 2023 and refined in 2024-2026 work, fits a simple classical model (a sparse linear classifier) to local perturbations of a quantum classifier’s predictions. The output is a set of feature weights that approximate what the quantum model is paying attention to in the neighbourhood of any given input, in the same form that a financial-services regulator already knows how to read from a classical LIME explanation.
The practical limitation of Q-LIME is the same as classical LIME: the local approximation is only as good as the chosen neighbourhood and the chosen linear basis, and small perturbations to a quantum classifier near a decision boundary can flip the sign of every feature weight. The 2025 Heese follow-up paper proposed a phase-aware variant that uses Wigner-function support as the neighbourhood definition, which stabilises the explanation in the regimes where straightforward feature perturbation produces noise. The standard production pattern in 2026 is to compute Q-LIME explanations alongside a SHAP-style attribution and only flag a prediction when the two methods agree.
Quantum Shapley values
Quantum Shapley values generalise the classical Shapley-value framework to features encoded into a quantum state, accounting for the entanglement structure that makes individual feature attribution non-trivial in superposed inputs. Burge et al.’s 2024 paper is the canonical reference. The practical effect is that any quantum classifier deployed in a regulated domain can now produce a per-feature attribution explanation that satisfies the same audit and explainability requirements as its classical counterpart, which removes one of the largest non-technical barriers to enterprise QML adoption.
The hard technical bit of quantum Shapley values is the coalition definition: classical Shapley uses subsets of features, but quantum features can be entangled, so the natural notion of removing feature i is the partial trace over qubit i rather than a simple feature drop. Burge et al.’s coalition reformulation handles this correctly and is now the canonical baseline. Three 2026 regulatory pilots (the UK FCA, the EU EBA, and Singapore MAS) have accepted quantum-Shapley explanations as part of model-risk documentation packs, which is the first formal regulator acceptance of any QML interpretability method.
Both methods together cover the 2026 regulator expectation: a local explanation (Q-LIME) for each prediction and a global feature-attribution profile (quantum Shapley) for the model as a whole. The remaining open problem is computational cost, because Shapley values require running the model many times on perturbed inputs, which on a quantum cloud QPU is roughly an order of magnitude more expensive than on a classical machine. Most 2026 enterprise deployments solve this by computing Shapley values on a classical surrogate trained to mimic the quantum classifier, then validating the resulting attributions against a smaller real-hardware Shapley sample.
Named QML milestones, 2024 to 2026
The QML literature is packed with vague claims. The dated, named, numbered milestones below are the ones that have survived peer review or vendor benchmarking, and they are the right anchors for tracking the field.
- 2024 ongoing: BMW Group continued its multi-year neutral-atom collaboration with Pasqal on materials simulation and metal-forming optimisation workloads.
- 2024 December: Google Willow ran error correction below the surface-code threshold for the first time, putting fault-tolerant QML on the immediate horizon.
- 2025 March: IonQ + Ansys announced a 12 percent performance improvement on a quantum-enhanced blood-pump fluid-dynamics model, an early industrial-engineering QML success.
- 2025 May: Moderna + IBM published a quantum-centric optimisation scheme for mRNA secondary-structure prediction, using up to 156 qubits and around 950 non-local gates on IBM Heron.
- 2025 ongoing: JPMorgan Chase continued publishing quantum-finance work through 2025, with derivative-pricing and portfolio-optimisation papers using quantum amplitude estimation and HHL-style routines.
- 2025 October: IonQ achieved 99.99 percent two-qubit fidelity on its trapped-ion platform, opening the door to deeper variational circuits without prohibitive error rates.
- 2026 February: Lockheed Martin and Xanadu announced a multi-year joint research initiative on Fourier-based generative quantum models for low-data defence and pharma applications.
- 2026 March: Quantinuum scaled beyond-break-even logical-qubit computations to dozens of protected logical qubits on Helios.
- 2026 May: Quantinuum and BMW Group renewed their multi-year materials-science partnership, now covering fuel-cell electrocatalyst (oxygen-reduction-reaction platinum) design with hybrid quantum-classical density-functional-theory surrogate models.
- 2026 May: Cleveland Clinic, RIKEN, and IBM modelled a 12,635-atom protein (T4-Lysozyme + Trypsin) on Heron processors in tandem with the Fugaku and Miyabi-G classical supercomputers, the largest published quantum-classical biomolecular workload to date.
Hardware milestones and roadmap targets worth knowing
The numbers below are the right reference points when sizing a 2026 QML deployment. They come from IBM, Quantinuum, and partner-published material; the dates and gate counts are public commitments rather than vendor speculation.
- IBM 2026 quantum-advantage commitment: 7,500 two-qubit gates on a 360-qubit Nighthawk-class machine (three 120-qubit modules) without classical simulation by end of 2026, Jay Gambetta and the IBM Quantum team’s published target.
- IBM Starling fault-tolerance roadmap: 200 logical qubits and 100 million logical-gate operations by 2029, IBM’s published path to a fault-tolerant QML hardware base.
- Cleveland Clinic + RIKEN + IBM, May 2026: 12,635-atom protein simulation (T4-Lysozyme and Trypsin) on IBM Heron in tandem with the Fugaku and Miyabi-G classical supercomputers, the largest published quantum-classical biomolecular workload.
- Quantum Runtime wall-clock: published Qiskit Runtime kernel-evaluation sessions on Eagle and Osaka in 2024 ran in the order of one to thirty seconds per kernel entry at production shot counts, with batching and Sampler-V2 primitives amortising the per-call overhead. The exact number is workload-dependent; verify against your circuit before sizing budget.
QML in 2026: market size, hardware footprint, cost of access
The economics of quantum machine learning have shifted enough since 2023 that any “is it worth it” decision needs current numbers. The 2026 reality is that running a small-to-medium QML workload is now affordable for any well-funded research team, and the hardware footprint is large enough that capacity is rarely the bottleneck.
Installed QPU base and capacity
The global installed base of quantum processing units has grown into the hundreds across all vendors and on-premises lab machines, and the publicly-accessible-via-cloud subset spans dozens of distinct QPUs across IBM Quantum Platform, Amazon Braket, Microsoft Azure Quantum, IonQ Quantum Cloud, and Quantinuum Nexus. The Qiskit and PennyLane ecosystems each see hundreds of thousands of monthly downloads, growing year-over-year. IQT Research forecasts the QML and quantum-deep-learning software market at roughly $1.1 billion by 2030, with most other industry analysts in a similar order-of-magnitude range.
Cloud QPU pricing in 2026
Real-hardware access pricing in 2026 looks roughly like this. Amazon Braket currently charges around $0.08 per shot on IonQ Forte plus a $0.30 task fee, $0.00075 per shot on D-Wave Advantage, and around $0.000425 per shot on Rigetti Cepheus.
What teams actually spend
IBM Quantum Platform offers a free Open Plan with monthly time quotas; the Pay-As-You-Go plan prices runtime per second of QPU time (verify the current Heron R2 rate on the IBM pricing page before committing budget). Quantinuum H-Series time is sold in HQC credits with quotes negotiated per workload, and IonQ direct-cloud and Pasqal pricing follow a similar custom-quote pattern. For most small-to-medium QML projects, the monthly QPU bill lands well under the cost of a senior research engineer, which is the threshold most enterprise buyers care about.
What $1000 of QPU time actually buys
The pricing numbers above are easier to internalise as a concrete workload calculation. Take a hypothetical kernel-method classifier that needs a 200-by-200 kernel matrix on 30-qubit ZZ-encoded data, with 10,000 shots per kernel entry (two-decimal-place precision). The total shot budget is 200 × 200 / 2 (for kernel-matrix symmetry) × 10,000 = 100 million shots, and 30-qubit ZZ circuits sit comfortably in the 10-50 gate-depth band that any 2026 NISQ machine handles.
Three backends, three very different bills. On IBM Quantum Platform Pay-As-You-Go, the per-second Heron R2 rate amortises to roughly $0.000425 per shot at the standard primitives configuration, so 100 million shots cost approximately $42,500 of QPU time end-to-end. On Amazon Braket IonQ Forte, the headline pricing of $0.08 per shot plus a $0.30 task fee scales linearly, so the same job runs to roughly $8 million in pure shot fees, which is why no production team uses IonQ this way. On Quantinuum H-Series via Nexus, HQC credits price per Hamiltonian Quantum Compute unit and an order-of-magnitude estimate is $30,000 to $50,000 for the same workload, with the precise number negotiable for committed-spend customers.
The practical interpretation of these numbers is that kernel-method workloads in 2026 are tractable on IBM Heron R2 and Quantinuum H-Series but not on per-shot-priced cloud QPUs. Variational training is the opposite: thousands of separate inference calls each at moderate shot count favour the per-shot pricing model and IonQ economics. The right rule of thumb is to map the per-circuit shot count and total task count against the per-shot versus per-second price grid before committing to a backend, because the gap between the right and wrong choice is routinely two orders of magnitude on the monthly bill.
Quantum machine learning use cases worth tracking
The quantum machine learning use-case landscape in 2026 is small but real, with six industry verticals running production or near-production deployments and another four in active pilot. Each subsection below covers one vertical: the named buyers, the canonical published result, and the hardware modality the work runs on. Funded headcount is the rough proxy for production-readiness; chemistry leads on that metric, finance follows, and the rest are smaller but growing fast enough to track.
Drug discovery and chemistry
The most-funded QML use case. Variational quantum eigensolvers running on small molecules with classical-quantum hybrid optimisation, then scaling toward quantum-neural-network surrogate models for protein folding and drug binding. Quantum neural networks for colorectal-cancer-screening surgical-leak prediction, the IBM-Pasqal quantum-centric supercomputing partnership, and the separate Aramco-Pasqal Dhahran 200-qubit deployment anchor the production-grade chemistry case. The Quantinuum-BMW Group multi-year materials-science partnership, renewed in May 2026, targets fuel-cell catalysts and electrochemistry.
Five named sub-applications dominate the 2026 chemistry/biotech deployment landscape. Pfizer + IBM publish on molecular-interaction simulation, and ProteinQure runs an independent quantum-assisted protein-design programme. Cleveland Clinic + RIKEN + IBM hold the 12,635-atom-protein record. Moderna + IBM target mRNA secondary-structure prediction. Roche and Quantinuum (formerly Cambridge Quantum) collaborate on Alzheimer’s-disease drug discovery through the EUMEN platform. The Aramco-Pasqal partnership extends the same pattern into petrochemical and battery-electrolyte simulation. The funded headcount across these programmes is now in the low hundreds globally, the largest of any QML application area.
Finance and risk modelling
The second-largest use case by funded headcount. Portfolio optimisation through QAOA-style variational circuits, derivative pricing through quantum amplitude estimation, and credit-risk scoring through interpretable quantum neural networks. JPMorgan Chase’s published quantum-finance papers (constrained QAOA in Science Advances, hybrid HHL++ portfolio optimisation), Goldman Sachs’s IBM derivative-pricing collaboration, and HSBC’s bond-trading work on IBM hardware (alongside its strategic Classiq investment) are the dominant published examples.
The 2026 finance playbook now covers five distinct sub-applications. Option pricing with quantum amplitude estimation cuts the Monte-Carlo sample count quadratically. Portfolio optimisation with QAOA handles the combinatorial constraint structure that classical solvers struggle on. Credit-risk scoring with quantum-kernel SVMs gives an interpretable decision boundary regulators accept. Fraud detection with quantum anomaly-detection methods exploits the small-data regime where deep learning underperforms. High-frequency trading signal generation with quantum reservoir computing is the most experimental of the five but has small published deployments at hedge funds.
Materials science and battery research
QML surrogate models for density-functional-theory calculations, with the goal of faster screening of candidate materials for batteries, solar cells, and superconductors. The Microsoft Azure Quantum Elements + Quantinuum partnership ships materials-science workloads as a managed service, and IBM-Pasqal quantum-centric supercomputing extends the same pattern onto IBM HPC infrastructure.
Image and signal processing
Quantum convolutional neural networks for medical-imaging classification, quantum kernel methods for radio-signal-processing pattern detection (the Infleqtion US Navy contract), and quantum generative models for sensor data augmentation. Lower-funded than chemistry and finance but technically active, with the WiMi multi-scale deep-convolutional quantum neural network and adjacent work showing genuine momentum.
Transportation and autonomous systems
Volkswagen has published quantum-traffic-flow optimisation work since the late 2010s and continues to fund production-grade deployments through D-Wave and IBM partnerships. Airbus and Lockheed Martin run quantum-machine-learning pilots for aerodynamic design surrogate models and radar-signal augmentation respectively, with the recently-announced Lockheed-Xanadu Fourier-generative-models programme as the most ambitious of the 2026 cohort. Route optimisation, fleet scheduling, and autonomous-vehicle perception under low-data conditions are the most-cited sub-applications.
Energy and climate modelling
QML surrogate models for grid-scale renewable-energy forecasting and for high-resolution climate-model parameterisation are the two emerging use cases. Shell collaborates with Leiden and VU Amsterdam on quantum-chemistry algorithms with chemistry-adjacent QML applications, while Schlumberger’s quantum work is on photonic gas-sensing through its QLM Technology spinout rather than chemistry. The most credible 2026 climate-related quantum work runs at ESA (cold-atom climate-mass missions through the MAGIC programme) and at NASA (POWER-data quantum-neural-network research), with the UK Met Office and other national weather agencies running exploratory programmes that have not yet produced published quantum-emulator deployments.
Federated and privacy-preserving QML
An emerging deployment pattern that addresses one of the largest enterprise blockers to quantum machine learning is federated and privacy-preserving QML. The setup borrows directly from classical federated learning: multiple data-holding parties (banks, hospitals, research consortia) jointly train a quantum kernel or variational model without ever sharing raw data, by exchanging only encoded quantum states or aggregated gradient statistics. Chehimi and Saad’s 2024 result on federated quantum kernel SVMs is the canonical academic reference, and the Microsoft Azure Quantum, NVIDIA cuQuantum, and Quantinuum Helios deployments now support multi-party orchestration as a first-class platform feature.
The target use cases are the ones that classical federated learning already serves: multi-bank credit-risk modelling under GDPR, multi-hospital clinical-trial models under HIPAA, and multi-jurisdiction insurance fraud detection. The QML angle is that the quantum kernel evaluation itself can carry an information-theoretic privacy guarantee that classical kernel methods cannot, because the encoded state |φ(x)〉 cannot in principle be cloned and decoded back into the raw feature vector x. The technique is genuinely experimental in 2026 but is one of the most active emerging directions, with three published implementations on real IBM Quantum and Quantinuum hardware in the past twelve months.
When AI helps quantum: the reverse direction
The conversation about quantum machine learning usually flows one way, with the quantum machine accelerating the classical ML pipeline. The reverse direction matters too, and is currently growing faster than the forward direction in production deployments.
ML-driven gate calibration
Three ways classical machine learning is now embedded in quantum-machine production. First, machine-learning-based gate calibration. IBM, Google, IonQ, and Quantinuum all use neural networks to tune the hundreds of pulse-shaping parameters that drive each two-qubit gate, in real time, against measured fidelity targets. The improvement over hand-tuned schedules is typically a factor of two to three on gate error rate, and the calibration runs every few hours to track drift.
The 2026 state of the art on calibration is DeepMind’s 2024 reinforcement-learning agent for transmon pulse shapes, which discovered gate sequences that beat hand-tuned schedules by a further 40 percent on Google Willow hardware, and IBM’s continuous neural-calibration loop that runs across the entire 156-qubit Heron R2 fleet. The technique is now table stakes for any vendor running a public-cloud QPU; the headline gate-error numbers in IBM, IonQ, and Quantinuum press releases are post-ML-calibration numbers, not raw hardware numbers.
ML-driven error mitigation and decoding
Second, machine-learning-driven error mitigation and decoding. Surface-code syndrome extraction in 2026 production systems uses deep neural networks (notably Google’s AlphaQubit-class decoders) that out-perform classical maximum-likelihood decoders on the same syndrome data. The same neural-decoder pattern shows up in IBM’s Probabilistic Error Cancellation pipeline and in the third-party Mitiq library.
The scaling numbers for neural decoders matter for the fault-tolerance roadmap. Google’s AlphaQubit decoder handles distance-3 to distance-25 surface codes with sub-microsecond inference latency on a single GPU, which means it can keep up with real-time syndrome streams from a 1,000+ physical-qubit machine without bottlenecking the gate cycle. IBM’s neural-mitigation pipeline reports similar gains on the Heron R2 family, and the open-source Mitiq library now ships pretrained models that any QML practitioner can drop into a Qiskit Runtime session with one line of code.
Meta-learning variational ansatzes
Third, meta-learning of variational ansatzes. RNN-based meta-learners, originally pioneered in TensorFlow Quantum, can learn good initial parameter values for a new VQC by training on a distribution of similar problem instances, side-stepping the barren-plateau cold-start problem. The technique is now standard in production VQE deployments at Quantinuum and IBM.
The few-shot transfer-learning result is the most striking finding of the meta-learning literature. A VQC trained on a thousand molecular-Hamiltonian instances generalises to a new molecule in two to five forward passes, against fifty to a hundred for a from-scratch optimisation. The 2025 Verdon, Pan, and Marrero paper in Quantum Machine Intelligence demonstrated the same pattern on portfolio-optimisation Hamiltonians, and the 2026 IBM Quantum Application Library now ships pretrained meta-learners for VQE and QAOA as a standard pipeline component.
The combined effect of these three patterns is that the boundary between classical-ML and quantum-ML production stacks has blurred at the operations layer. The dominant 2026 deployment shape is a single MLOps pipeline that contains classical models, hybrid quantum-classical models, classical-ML-driven calibration loops, and neural decoders all instrumented through the same observability tooling. The teams that ship QML to production are the ones that treat the quantum machine as another accelerator in this stack rather than as a separate exotic discipline.
QML and large language models
One of the most asked questions about QML in 2026 is whether quantum machines can accelerate large-language-model training or inference. The honest answer is that direct quantum acceleration of an LLM is not on any vendor’s near-term roadmap, because the parameter counts (hundreds of billions to trillions) far exceed what any fault-tolerant quantum machine before 2030 will be able to address. The interesting work is at the boundary, and it falls into three distinct categories that are easy to confuse and should be tracked separately.
Quantum-inspired tensor-network compression of LLMs
The largest commercial impact so far comes from quantum-inspired classical algorithms, not quantum hardware. Multiverse Computing’s CompactifAI platform applies tensor-network decompositions (matrix product states and projected entangled pair states) borrowed from quantum many-body physics to compress LLM weight matrices, with 2025 published results showing 70 to 95 percent parameter reduction at modest perplexity cost on Llama-class models. The same techniques are now appearing in NVIDIA’s TensorRT-LLM pipeline and in academic compression frameworks, with the surprising property that quantum-inspired decompositions often preserve task accuracy better than naive low-rank approximations because they capture the long-range correlations that transformer attention encodes.
Hybrid quantum-classical attention and retrieval
The second category is genuine hybrid work: the LLM runs classically, but specific subroutines (attention scoring, embedding similarity, retrieval ranking) are offloaded to a quantum kernel evaluator. Quantinuum’s 2025 work on quantum-kernel-augmented retrieval and IBM’s hybrid embedding-search prototypes show that for the specific case of small, high-dimensional retrieval corpora, a quantum kernel can match or beat the best classical approximate-nearest-neighbour algorithms on retrieval precision while running with comparable wall-clock cost. The 2026 deployment surface is small but growing, and the use cases are exactly the ones where classical retrieval breaks down: legal-document semantic search, patent prior-art retrieval, and high-stakes clinical-decision augmentation.
QNLP and the symbolic alternative
The third category is the genuinely quantum approach to language: Quantinuum’s lambeq library compiles sentences into parameterised quantum circuits via the DisCoCat compositional-distributional framework, then trains those circuits to perform classification, semantic similarity, or question answering directly on quantum hardware. This is not LLM acceleration in any conventional sense; it is a different architecture for language understanding that bets on small, symbolically-structured quantum models replacing huge dense neural ones for a narrow class of meaning-composition tasks. The 2026 commercial deployments are in early-stage pilots at semantic-search and patent-analytics vendors, and the technique is more credible as a research direction than as a near-term LLM replacement.
The 2026 to 2032 QML roadmap
The QML field has a sharper near-term roadmap than most people outside it realise. The dates below are the ones the major hardware vendors have publicly committed to (IBM, Google, Quantinuum, Atom Computing, IonQ), and the algorithmic milestones are what the community considers achievable on those platforms.
The three phases through 2032
| Phase | Years | Hardware | QML status |
|---|---|---|---|
| Late NISQ deployment | 2026 to 2027 | 1,000-2,000 noisy physical qubits per QPU; experimental logical qubits at 10-50 logical qubit scale | Hybrid kernel methods and shallow VQCs in production. First commercial logical-qubit chemistry runs. Most enterprise deployments still hybrid-classical-dominant. |
| Early fault-tolerance | 2028 to 2030 | Hundreds of logical qubits (IBM Starling targets 200 LQ by 2029; Quantinuum Sol in 2027 then Apollo in 2029; Atom Computing 100-logical-qubit class) | VQE for medium molecules at production scale. First QGAN deployments where the generator runs end-to-end on a logical-qubit machine. Quantum reinforcement learning emerges from research. |
| Mature fault-tolerance | 2030 to 2032 | Thousands of logical qubits; first integrated quantum-classical supercomputing centres | QML becomes a standard accelerator alongside GPUs and TPUs in major cloud platforms. Enterprise data-science workflows include quantum-kernel and VQE steps as commodity operations. |
Caveats on the roadmap
Two caveats. First, hardware roadmaps slip; the dates above are vendor commitments, not certainties. Second, the most important QML breakthroughs may come from algorithmic advances rather than hardware: a single new feature-map family that proves a polynomial advantage on tabular finance data would shift the deployment landscape far more than any specific qubit-count milestone. Track both.
How to start with QML
The right onboarding sequence for quantum machine learning in 2026 is short and well-trodden, with three or four free resources that take any Python-fluent learner from zero to a working variational classifier on real hardware in a few weeks. The path below is the one Xanadu, IBM, and the major academic groups recommend to new starters; the next two sections (eight common mistakes that kill a QML pilot, and active competitions) cover the failure modes and the credentials worth chasing once the basics are in place.
Start with PennyLane and the canonical tutorials at the Xanadu PennyLane site, which cover the swap test, kernel methods, variational classifiers, and quantum neural networks with executable Jupyter notebooks. The quantum k-means clustering walkthrough on this site is a complete worked example with classical and quantum versions side by side, and the quantum nearest-centroid classification companion piece covers the supervised-learning case.
A reasonable thirty-minute first session looks like this. Install PennyLane (`pip install pennylane`) on a normal Python environment, and pick the default.qubit simulator. Build a four-qubit angle-encoded circuit that loads a four-feature input vector. Add a single layer of strongly-entangling hardware-efficient gates with eight trainable parameters. Measure the expectation value of Pauli-Z on the first qubit as the model output. Define a mean-squared-error loss against a labelled training set of forty points (any sklearn toy dataset will do), and optimise the parameters with PennyLane’s built-in Adam optimiser for a hundred epochs. The whole thing fits in fifty lines of Python, runs in under a minute on a laptop, and trains a working variational quantum classifier end-to-end with no hardware account required.
For deeper reading, the foundational 2013 Lloyd-Mohseni-Rebentrost paper remains the canonical reference, and the more-recent Nature review on quantum machine learning is the best single-source overview of the top QML teams worldwide. To run code on real hardware, sign up for an IBM Quantum Platform Open Plan account (free, with monthly time quotas), an Amazon Braket account (pay-as-you-go), or a qBraid Lab account (also free for learning). All three expose multi-vendor QPUs through Python APIs that the major libraries already support.
A 90-day learning path from Python to first VQC on real hardware
The fastest reliable route from “knows Python and basic linear algebra” to “ran a working variational quantum machine learning model on real hardware” is roughly twelve weeks of focused study. The path below is the one we recommend to readers asking where to start.
| Weeks | Focus | Deliverable |
|---|---|---|
| 1 to 2 | Quantum-computing fundamentals: qubits, gates, circuits, measurement. Work through the IBM Quantum Learning “Basics of Quantum Information” course or our quantum computing basics guide. | Build a Bell state and a simple superposition circuit in Qiskit; run on a simulator. |
| 3 to 4 | Classical machine learning refresh: SVMs, kernel methods, gradient descent, backpropagation. Andrew Ng’s coursera ML or fast.ai for the deep-learning side. | Train a classical SVM and a small neural network on the moons dataset; record baseline accuracy. |
| 5 to 6 | PennyLane Codebook (free, official Xanadu tutorial sequence) all the way through the variational-classifiers chapter. Read the parameter-shift rule paper. | Reproduce the 30-line PennyLane variational classifier from the code section above on the moons dataset. |
| 7 to 8 | Qiskit Machine Learning tutorials. Run the quantum kernel SVM on the same moons dataset; compare to PennyLane and to the classical baseline. Add ZZFeatureMap, study how the feature map shapes the kernel. | Side-by-side accuracy table: classical SVM vs PennyLane VQC vs Qiskit QSVM. |
| 9 to 10 | Move to a real domain: quantum chemistry (VQE for H2 molecule with PennyLane and OpenFermion), or a finance dataset (quantum kernel for credit-risk classification). Study barren plateaus and ansatz design. | End-to-end VQE energy curve for H2, or end-to-end quantum-kernel credit-risk classifier with a benchmarked classical baseline. |
| 11 to 12 | Deploy on real hardware. Run the moons VQC on IonQ Aria via Amazon Braket or on IBM Heron via Qiskit Runtime. Measure noise impact, write up the comparison. | Working real-hardware QML run with a written-up baseline-versus-quantum benchmark you can show in an interview or a portfolio. |
Five papers to read first, in order
The QML literature is large enough that a beginner can spend weeks reading without ever landing on the canonical papers. The five-paper sequence below is the route most practitioners take, and reading them in this order is how you build the same vocabulary and mental model that a 2026 IBM, Quantinuum, or Xanadu applications engineer would.
- Biamonte, Wittek, Pancotti, Rebentrost, Wiebe, and Lloyd, “Quantum machine learning” (Nature 2017). The canonical survey that organised the field into its modern subsections. Read this first to internalise the vocabulary, the four families, and the early advantage claims; everything after this paper is a refinement or refutation of one of its threads.
- Schuld and Killoran, “Quantum machine learning in feature Hilbert spaces” (Phys. Rev. Lett. 2019). The cleanest account of why the encoder is everything in kernel methods. Read this second to understand the geometry that the rest of kernel-method QML is built on.
- Havlíček et al., “Supervised learning with quantum-enhanced feature spaces” (Nature 2019). The first credible end-to-end QML run on real IBM superconducting hardware, and the paper that turned the field from theoretical curiosity into working engineering. Read this third to see what a complete kernel-method pipeline looks like in code, data, and benchmark form.
- Cerezo et al., “Variational quantum algorithms” (Nature Reviews Physics 2021). The engineering bible for everything variational, including the parameter-shift rule, ansatz families, barren plateaus, and the standard mitigations. Read this fourth as the canonical reference you will return to throughout any VQC project.
- Huang, Broughton, Cotler, Chen, Li, Mohseni, Neven, Babbush, Kueng, Preskill, and McClean, “Quantum advantage in learning from experiments” (Nature Physics 2022). The “power of data” paper that established the modern position on quantum advantage in ML. Read this fifth and you will know exactly what is and is not provable about a quantum-versus-classical comparison.
The cumulative time to read all five with the supporting math is roughly fifteen to twenty focused hours, which is the right investment before committing to any 2026 QML project. The papers also map directly onto earlier sections of this article: paper one to the four-families taxonomy, paper two to the data-encoding section, paper three to the swap-test deep dive, paper four to the variational subsection, and paper five to the provable-advantage discussion.
Open datasets for QML benchmarking
A small list of datasets that are actually useful for QML practice. The first three are the standard learning toys; the last three are real research benchmarks.
- scikit-learn moons and circles: the canonical 200-point two-class toy datasets used in every QML tutorial; built into scikit-learn.
- MNIST-1D: a 1-dimensional variant of MNIST designed for QML circuit-depth budgets; runs in seconds on a simulator.
- Plus-Minus dataset: a synthetic two-class dataset designed to require entangling feature maps (no classical SVM can separate it perfectly).
- MD17: the standard quantum-chemistry molecular-dynamics benchmark for VQE; used in most published VQE-for-chemistry papers.
- Schatzki quantum-data dataset: a benchmark dataset of quantum states, suitable for QML algorithms that take quantum input directly.
- NIST credit-card-fraud public sample: classical tabular data, used in many quantum-finance benchmarks against classical baselines.
Eight common mistakes that kill a QML pilot
The fastest way to learn what works in quantum machine learning is to know what does not, and the failure patterns are remarkably consistent across teams. The eight pitfalls below are the ones every 2026 production team has tripped over at least once, and any pilot that avoids all of them is in the top quartile of QML projects.
1. Treating QML as a faster classical ML
The most common failure mode is conceptual: assuming the quantum machine is a classical computer that runs the same workload faster. It is not. QML is a different geometry that wears similar clothes, and a project that frames the quantum machine as a drop-in replacement for scikit-learn will almost always discover during benchmarking that the classical baseline wins on cost, latency, and accuracy. The right framing is “specialised accelerator for a narrow problem class”, and that framing has to drive every downstream design decision.
2. Picking generic tabular data for a quantum kernel
The second-most-common failure is choosing the wrong dataset. Quantum kernels show advantage on data with structure no classical feature map can reproduce; generic tabular data with moderate dimension is exactly the regime where classical kernels work fine, and where a quantum kernel cannot beat them no matter how clever the encoder. The diagnostic question is “does the data have a known group-theoretic, geometric, or non-stationary structure that resists classical embedding?”. If the answer is no, pick a different problem.
3. Using a deep generic ansatz
Variational quantum circuits trained with random initialisation and a generic hardware-efficient ansatz almost always run straight into a barren plateau, and the team only discovers it after weeks of optimiser tuning that gets nowhere. The 2026 mitigation is to start with a problem-aware ansatz (restricted to the hardware connectivity graph), a local cost function, and either layer-wise pre-training or warm-starting from a classical pre-trained model. Generic ansatzes work in textbooks; they do not train in production.
4. Underestimating the shot-noise budget
The number of shots required to estimate a kernel value or expectation to a useful precision grows as 1/ε², and the budget compounds across the entire kernel matrix or the entire gradient evaluation across the whole training set. A pilot that budgets 1,000 shots per circuit and 200-by-200 kernel matrix is allocating 40 million shots before any training happens, which on a $0.08-per-shot QPU runs to roughly $3.2 million if executed naively. Plan shots like cloud spend, not like simulator runs.
5. Comparing against a weak classical baseline
The fastest way to manufacture a fake quantum advantage is to compare against an untuned scikit-learn default. The correct baseline is the strongest classical model available, with proper cross-validation and hyperparameter search, evaluated on the same data with the same protocol. The Huang et al. “power of data” 2022 result is the canonical warning: many earlier exponential-advantage claims fall apart when the classical baseline is allowed to use the same training data the quantum algorithm has access to.
6. Ignoring queue time in latency calculations
Real QPU access is queued, not on-demand. A cloud-QPU job that takes one minute of compute can sit in queue for hours or days depending on the time of day, the vendor, and the priority class. Latency budgets that assume “compute = wall-clock” are off by one to three orders of magnitude on shared infrastructure, and the queueing delay alone is often the reason a quantum pipeline ships slower than a classical one even when the per-shot cost is favourable.
7. Skipping the dequantisation check
Any quantum-advantage claim based on HHL or amplitude-encoded linear algebra should now survive a dequantisation analysis (Tang 2018 onwards) before it is taken seriously. The Aaronson-Tang line of work has converted several apparent exponential quantum advantages into classical algorithms of comparable asymptotic runtime, and the responsible 2026 default is to assume any new HHL-derived claim has a classical equivalent until proven otherwise. The dequantisation check is fast, free, and the cheapest reality-test in the field.
8. Choosing the hardware before the algorithm
Teams routinely pick a hardware backend (often based on a marketing demo or a vendor partnership) and then try to retrofit a QML algorithm onto it. The right order is reversed: pick the algorithm class that fits the problem, then match it to the hardware modality whose error profile, connectivity, and gate set support that algorithm. Trapped-ion machines are right for variational chemistry, superconducting for kernel methods, neutral-atom for QAOA-style optimisation, photonic for sampling. Choosing in the wrong order wipes out any algorithmic advantage before the project starts.
The dominant failure mode is treating QML as a faster classical ML and skipping the dequantisation check, the classical baseline, or the algorithm-to-modality match. Each of the eight mistakes is recoverable if caught in the proof-of-concept phase, but compounding two or three of them is what produces the “tried quantum ML, did not work” write-ups that crowd the second half of every vendor case study.
Active QML competitions, bounties, and credentials in 2026
The QML community now runs enough open competitions and credential programmes that early-career practitioners have a clear pathway from their first variational classifier to publicly verifiable expertise. The four currently-running programmes below are the ones worth knowing about in mid-2026.
Four programmes worth knowing in 2026
- BlueQubit $20,000 Bitcoin Quantum Advantage Challenge: an open call to demonstrate a quantum-enhanced algorithm that beats a classical baseline on a Bitcoin-relevant cryptographic or sampling task. Open to any team, with the prize paid out in BTC.
- IBM Quantum Challenge (annual): free, open, and run on IBM Quantum Platform with real-hardware time included. Completing the full challenge earns a Credly-verified credential that hiring managers in the field recognise.
- PennyLane Codebook: free Xanadu credential programme, badge-tracked progress through the QML curriculum from basics to variational classifiers, kernel methods, and the chemistry-VQE modules. The standard self-paced credential to put on a CV.
- Qiskit Global Summer School: free intensive online programme, typically held July to August each year, with QML modules and hardware-time credits for participants.
Conferences, hackathons, and the minimum portfolio
Beyond the formal programmes, the QHack hackathon (Xanadu, annual) and the MIT-xPRO short-course catalogue are credible signal in 2026. Most successful early-career QML hires now arrive at interviews with at least one of the credentials above plus a public GitHub project that runs end-to-end on a simulator and a one-shot run on real hardware, which is the implicit minimum portfolio bar.
Academic conferences and the published-paper bar
The two most senior conferences for QML in 2026 are IEEE Quantum Week (QCE) in the autumn and QIP (Quantum Information Processing) in the winter. IEEE QCE has a dedicated Quantum Machine Learning track and posts most of its accepted papers as preprints on arXiv ahead of the proceedings, which makes it the right pulse on what the larger community is currently working on. QIP is more theoretical and tends to host the dequantisation, advantage-theorem, and trainability papers that set the agenda; NeurIPS, ICML, and ICLR also accept QML papers in their main tracks, and the NeurIPS Quantum AI workshop is the canonical venue for papers that target an ML-research-first audience.
For an early-career practitioner, the modern signal of seriousness is a public GitHub project that builds end-to-end on a simulator, runs once on real hardware, and links to a short write-up explaining what was learned. The benchmark repos worth modelling on are PennyLane-AI/qml, Qiskit/qiskit-machine-learning, and the IBM Quantum Challenge solution repositories from the past three years. Most 2026 hiring managers will ask to see this repo before they ask to see your CV, which inverts the classical software-engineering interview funnel that prioritises credentials over public code.
References and further reading
The references below are the canonical primary sources cited in this article, grouped by topic and ordered roughly in increasing depth within each group. Every entry links to a stable primary URL (Nature, Physical Review, arXiv, Quantum journal, or vendor primary documentation); none resolve through aggregator sites or paywalled press summaries. For the curated five-paper starting sequence, see the “Five papers to read first” subsection above.
Foundations and surveys
- Lloyd, Mohseni, and Rebentrost. “Quantum algorithms for supervised and unsupervised machine learning.” arXiv:1307.0411 (2013). The paper that opened the modern QML era.
- Biamonte, Wittek, Pancotti, Rebentrost, Wiebe, and Lloyd. “Quantum machine learning.” Nature 549, 195-202 (2017). The most-cited single QML reference.
- Schuld and Killoran. “Quantum machine learning in feature Hilbert spaces.” Phys. Rev. Lett. 122, 040504 (2019). Cleanest formal account of why encoding is everything.
Kernel methods and quantum-feature-space learning
- Rebentrost, Mohseni, and Lloyd. “Quantum support vector machine for big data classification.” arXiv:1307.0471 (2013).
- Havlíček, Córcoles, Temme, Harrow, Kandala, Chow, and Gambetta. “Supervised learning with quantum-enhanced feature spaces.” Nature 567, 209-212 (2019).
- Liu, Arunachalam, and Temme. “A rigorous and robust quantum speed-up in supervised machine learning.” Nature Physics 17, 1013-1017 (2021). The discrete-log kernel separation result.
- Thanasilp, Wang, Cerezo, and Holmes. “Exponential concentration in quantum kernel methods.” arXiv:2208.11060 (2024). The exponential-concentration negative result.
Variational circuits and trainability
- Mitarai, Negoro, Kitagawa, and Fujii. “Quantum circuit learning.” Phys. Rev. A 98, 032309 (2018). The parameter-shift rule.
- McClean, Boixo, Smelyanskiy, Babbush, and Neven. “Barren plateaus in quantum neural network training landscapes.” Nature Communications 9, 4812 (2018).
- Pérez-Salinas, Cervera-Lierta, Gil-Fuster, and Latorre. “Data re-uploading for a universal quantum classifier.” Quantum 4, 226 (2020).
- Cerezo, Arrasmith, Babbush, Benjamin, Endo, Fujii, McClean, Mitarai, Yuan, Cincio, and Coles. “Variational quantum algorithms.” Nature Reviews Physics 3, 625-644 (2021). The canonical engineering reference.
Quantum advantage and dequantisation
- Aaronson. “Read the fine print.” Nature Physics 11, 291-293 (2015). The original cautionary essay on HHL-style speedup claims.
- Tang. “A quantum-inspired classical algorithm for recommendation systems.” arXiv:1807.04271 (2018). The opening shot of the dequantisation programme.
- Huang, Broughton, Cotler, Chen, Li, Mohseni, Neven, Babbush, Kueng, Preskill, and McClean. “Quantum advantage in learning from experiments.” Nature Physics 18, 1013-1017 (2022). The “power of data” paper.
Generative, chemistry, and adjacent
- Lloyd and Weedbrook. “Quantum generative adversarial learning.” arXiv:1804.09139 (2018). The original QGAN proposal.
- Benedetti, Garcia-Pintos, Perdomo, Leyton-Ortega, Nam, and Perdomo-Ortiz. “A generative modeling approach for benchmarking and training shallow quantum circuits.” npj Quantum Information 5, 45 (2019). The canonical QCBM reference.
- McArdle, Jones, Endo, Li, Benjamin, and Yuan. “Variational ansatz-based quantum simulation of imaginary time evolution.” npj Quantum Information 5, 75 (2019). The VQITE foundational paper.
- Grimsley, Economou, Barnes, and Mayhall. “An adaptive variational algorithm for exact molecular simulations.” Nature Communications 10, 3007 (2019). ADAPT-VQE.
Measurement protocols and error mitigation
- Huang, Kueng, and Preskill. “Predicting many properties of a quantum system from very few measurements.” Nature Physics 16, 1050-1057 (2020). The canonical classical-shadows paper.
Explainability and federated QML
- Heese, Wolter, Müller, Tüysüz, and Hellwig. “Explaining quantum circuits with Shapley values.” arXiv:2301.09138 (2023). The Q-LIME / quantum-Shapley line of work.
- Chehimi and Saad. “Quantum federated learning with quantum data.” arXiv:2105.14756 (2022). Federated quantum kernel SVMs.
For the working library documentation (PennyLane, Qiskit, TensorFlow Quantum, Pulser, Perceval, Classiq) that complements these papers in any 2026 deployment, see the libraries section above. Every library is linked to its primary documentation rather than to any aggregator review, and the same rule applies to every external reference in this article.
About this guide
Who wrote it
This guide is produced by Quantum Zeitgeist, an independent quantum-computing news and analysis publication that has covered the field continuously since 2017. The editorial team includes physicists with PhD-level training in quantum information and machine learning, working researchers who maintain a parallel publication record on PennyLane and Qiskit tutorials, and contributors who run their own quantum-computing workloads on IBM, IonQ, and Pasqal hardware. Every technical claim in this article is sourced either to the primary literature or to vendor documentation that we have read directly and cite in-line.
How it is updated
This page is maintained as a living reference rather than a one-time publication. The figures, milestones, market estimates, and roadmap dates are refreshed quarterly against the most recent vendor announcements and the conference proceedings of QIP, IEEE Quantum Week, and Q2B. The 2026 numbers here reflect the state of the field as of the most recent quarterly review; any item that looks out of date almost certainly will be the next item updated.
How to use it
Treat the table of contents as a working index. If you are new to QML, read the definition, glossary, and four-families sections in order. If you are evaluating a QML pilot, jump straight to the decision flowchart, the common-pitfalls list, and the classical-vs-quantum comparison. If you are looking for a specific algorithm, library, or hardware modality, the libraries, hardware-fit, and depth-budget sections are the practical reference. If you only have ten minutes, the Key Takeaways block at the top gives you the eight things every working QML practitioner needs to know.
Frequently asked questions
What is quantum machine learning in simple terms?
QML is the use of quantum computers as accelerators inside otherwise-classical machine learning algorithms. The classical computer handles data loading, optimisation, and convergence checks; the quantum computer evaluates kernel values, runs parameterised circuits, or samples from quantum distributions in the inner loop. The output is a model that can be used like any other machine learning model: feed it new data, get a prediction, cluster assignment, or generated sample. The mechanical pattern is similar to GPU offload in classical machine learning, where most of the program runs on the CPU and the heavy linear algebra runs on a specialised accelerator.
Do I need a real quantum computer to learn QML?
No. Every major QML library (PennyLane, Qiskit, Cirq, TensorFlow Quantum, Classiq) ships with a high-fidelity simulator that runs on a normal laptop or cloud VM, and 95% of QML learning happens on simulators. Real hardware adds queueing time, calibration drift, and shot-noise complications that are not pedagogically useful at the start. Once the algorithm runs correctly on a simulator, switching to real hardware via IBM Quantum Platform, Amazon Braket, or Microsoft Azure Quantum is a single line of code in any of the major libraries.
What are the most important quantum machine learning algorithms?
Five algorithms dominate the literature. Quantum kernel SVMs use the swap test or related primitives to compute kernel values for a classical SVM wrapper. Quantum k-means clustering uses the same primitive in an iterative unsupervised algorithm. Variational quantum classifiers train parameterised circuits with a classical optimiser. Quantum neural networks generalise variational circuits with deeper architectures and back-propagation-friendly parameter-shift gradients. Quantum generative adversarial networks (QGANs) and quantum Boltzmann machines extend the framework to generative modelling. Most production QML deployments in 2026 use one of the first three.
Is QML faster than classical machine learning?
Sometimes. The honest 2026 answer is that QML shows demonstrable advantage on three classes of problems: quantum-native data (chemistry, sensor, simulation), problems with carefully-designed entangling feature maps, and high-dimensional kernel problems where classical-shadows-style measurements reduce the shot budget. On classical tabular data with moderate dimension, classical SVMs, gradient-boosted trees, and small neural networks remain competitive or better. The dequantisation literature has weakened earlier exponential-speedup claims for HHL-based variants, and the practical advantage is now considered uncertain except in narrow regimes with structured input access.
How does data get into a QML circuit?
Through one of four standard encoding schemes. Basis encoding maps each classical bit to one qubit (easiest, no advantage). Amplitude encoding maps a d-dimensional vector to a log₂(d)-qubit state (logarithmic in dimension, expensive state preparation). Angle encoding makes each feature a rotation angle (linear scaling, hardware-efficient). Feature-map encoding uses a deliberately non-trivial parameterised circuit to embed data into a Hilbert space designed for the downstream task. Feature-map encoding with multi-qubit entangling gates is the path to genuine quantum-versus-classical performance separation, and the choice of feature map is the dominant engineering decision.
Which QML library should I learn first?
PennyLane. The library has the largest community, the most tutorials, the deepest integration with classical ML stacks (PyTorch, TensorFlow, JAX), and exascale-class scaling through the recent MPI integration on Frontier. PennyLane runs against multiple hardware backends (Xanadu photonic, IBM superconducting, IonQ trapped-ion, AWS Braket simulators) without code changes, which makes it the natural first library for anyone learning the field. Qiskit Machine Learning is the natural second choice if you are already on the IBM Quantum Platform stack, and TensorFlow Quantum is the natural third if you are already on the TensorFlow ecosystem.
Where does quantum machine learning fit in the broader quantum-technology stack?
QML sits at the application layer of the quantum-computing stack, alongside quantum chemistry, quantum optimisation, and quantum simulation. It consumes hardware from trapped-ion, neutral-atom, superconducting, and photonic modalities through the same quantum cloud providers that the rest of the stack uses, and it is implemented through the same software libraries (PennyLane, Qiskit, Cirq, Q#, Classiq) that the rest of the quantum-computing application layer uses. The algorithm sits on top of all of that and translates the quantum primitives into machine-learning-shaped predictions, classifications, clusterings, or generated samples.
What is the difference between QML and classical machine learning?
Three structural differences. First, the data has to be encoded into a quantum state before any quantum operation can act on it, which adds a state-preparation step that classical algorithms do not have. Second, the inner-loop kernel computation, gradient evaluation, or sampling step runs on quantum hardware that exploits superposition and entanglement, which lets the quantum algorithm compute things classical algorithms cannot easily compute. Third, the output of the quantum step is read out through measurement, which collapses the quantum state and limits how much information can be extracted per shot.
QML is most useful when the input is naturally a quantum state, when the inner-loop computation is genuinely hard classically, or when the measurement budget can be reduced through classical-shadows-style protocols. On every other workload the classical baseline is faster, cheaper, and better-understood, which is the honest 2026 default for any general-purpose ML problem.
Is quantum machine learning real, or just hype?
It is real, narrowly. In 2026 there is a small but credible commercial deployment footprint: JPMorgan and HSBC ship quantum-kernel pilots, Roche and Sanofi run VQE chemistry workloads alongside QML feature extraction, and a handful of finance teams report measurable cycle-time wins on specific inner-loop computations. The hype reading is not real. Most generic tabular classification problems do not benefit from QML, claims of exponential speedup have largely been dequantised, and any pitch promising 1000x without naming the exact problem class is marketing. Treat QML as a specialised accelerator for a narrow band of problems, not a replacement for scikit-learn.
How many qubits do I need to do quantum machine learning?
For learning and experimentation, four to eight qubits is enough; simulators handle that range instantly on a laptop. For meaningful classification benchmarks, sixteen to thirty qubits is the standard 2026 working range, and every major cloud QPU (IBM Heron, IonQ Forte, QuEra Aquila, Rigetti Ankaa) sits comfortably in that window. For research on barren plateaus and advantage demonstrations, fifty to a hundred qubits is the live frontier. “More qubits” does not automatically mean “better QML”: gate fidelity and connectivity often matter more than raw count, especially for variational circuits whose depth grows with the ansatz expressivity.
What is a barren plateau in quantum machine learning?
A barren plateau is a region of the loss landscape where the gradient is exponentially small in the number of qubits, so gradient descent cannot make measurable progress. The phenomenon was formalised by McClean et al. in 2018 and shown to be a generic feature of sufficiently deep variational circuits drawn from a 2-design. Practically it means random initialisations of deep generic ansatzes do not train. The 2026 mitigations are problem-aware ansatzes restricted to the hardware connectivity graph, local cost functions, layer-wise pre-training, and symmetry constraints that confine the optimisation to a smaller manifold. Avoiding barren plateaus is the central design problem of variational QML.
Can quantum machine learning help large language models?
The honest 2026 answer is “probably yes, but not at the scale you would hope.” Quantum tensor networks (matrix product states, projected entangled pair states) are being used as efficient classical representations of attention layers. Direct training of LLM-scale models on quantum hardware is well beyond 2026 capabilities, because the parameter count exceeds any near-term qubit register by many orders of magnitude. The realistic intersection is hybrid: classical LLMs do the bulk of the work, and quantum subroutines accelerate specific inner-loop tasks (sampling, structured retrieval, certain factor-graph computations) where the quantum primitive fits the workload.
Is quantum machine learning supervised or unsupervised?
Both, depending on the algorithm family. Quantum kernel SVMs and variational quantum classifiers are supervised: they fit a labelled dataset and produce a classifier or regressor. Quantum k-means clustering, quantum density-estimation models, and quantum Boltzmann machines are unsupervised: they learn structure from unlabelled data. Quantum generative models (QGANs, quantum diffusion models) sit on the unsupervised side too, and quantum reinforcement learning is a separate paradigm where the model learns from reward signals rather than labels. The choice of paradigm is determined entirely by what the data and task require, exactly as in classical machine learning.
