Encoding information for storage or transmission is one of the fundamental tasks of information theory. For decades, the bit has been the fundamental unit for information science. However, advances in quantum computing have led to the development of quantum computers that use qubits(quantum bits) instead of bits. Qubits can be 0, 1, or in both of these two states at once(superposition). Empowered by superposition, quantum entanglement and interference quantum computers have capabilities to solve certain problems faster than conventional computers through various quantum algorithms.
One must load data for execution for any algorithm that processes input data. To load data, it must be encoded in qubits. Each algorithm expects that a certain data encoding is used, and then processes the data by performing calculations. Both the data itself and the chosen encoding influence the runtime of the loading process. In the worst case, loading requires exponential time. This is critical because quantum algorithms that promise a speed-up assume that loading data can be done faster, in logarithmic or linear time.
Encoding data in qubits is not trivial. Current devices contain a limited amount of qubits that are stable for a short amount of time. The number of operations to prepare the quantum state must be small as qubits decay fast and quantum gates are error-prone. To encode even a large number of data values efficiently, a logarithmic or linear runtime is ideal. Each encoding is essentially a trade-off between two major forces: (i) the number of required qubits and (ii) the runtime complexity for the loading process. Also, quantum encoding is notoriously difficult because the laws of quantum mechanics impose severe constraints — for example, a single quantum object cannot be copied, which hinders simple encoding schemes
One of the ways data is encoded is by using quantum embedding. Quantum embedding represents classical data as quantum states in a Hilbert space via a quantum feature map. The power of a quantum classifier translates into the ability to find embeddings that maximize the distance between data clusters in Hilbert space. Some of the embedding techniques used are basis embedding, amplitude embedding, and many more.
Basis encoding is primarily used when real numbers have to be mathematically manipulated in the course of quantum algorithms. Such an encoding represents real numbers as binary numbers and then transforms them into a quantum state in the computational basis. The embedded quantum state is the bit-wise translation of a binary string to the corresponding states of the quantum subsystems. Hence, one bit of classical information is represented by one quantum subsystem. Acting on binary features encoded as qubits gives us the most computational freedom to design quantum algorithms. In principle, each operation on bits that we can execute on a classical computer can be done on a quantum computer as well.
Example: Let us choose a binary fraction representation, where each number in the interval [0, 1) is represented by a τ -bit string according to .
To encode a vector x = (0.1, −0.7, 1.0) in basis encoding, we have to first translate it into a binary sequence, where we choose a binary fraction representation with precision τ = 4 and the first bit encoding the sign,
0.1 → 0 0001 …
−0.7 → 1 1011 …
1.0 → 0 1111 …
The vector corresponds to a binary sequence b = 00001 11011 01111 and can be represented by the quantum state .
In the amplitude-embedding technique, every quantum system is described by its wavefunction ψ which also defines the measurement probabilities. By expressing that the wavefunction is used to encode data, it is therefore implied that amplitudes of the quantum system are used to represent data values.
Amplitude encoding is required by many quantum machine learning algorithms. The main advantage of amplitude encoding is that we only need n=logNM qubits to encode a dataset of M inputs with N features each. If the quantum machine learning algorithm is polynomial in n (or qubit-efficient), it has a logarithmic runtime dependency on the data set size. Promises of exponential speedups from qubit-efficient quantum machine learning algorithms sound strange to machine learning practitioners because simply loading the NM features from the memory hardware takes time that is of course linear in NM. And indeed, the promises only hold if state preparation can also be done qubit-efficiently. This is in fact possible in some cases that exhibit a lot of structure. As a trivial example, if the goal is to produce a vector of uniform entries we simply need to apply n Hadamard gates – a qubit-efficient state preparation time. On the other hand, there are subspaces in a Hilbert space that cannot be reached from a given initial state with qubit-efficient algorithms. Therefore, it is an important and nontrivial open question which classes of relevant states for machine learning with amplitude encoding can be prepared qubit-efficiently.
In quantum machine learning, a lot of models use amplitude encoding, which means that a data vector is represented by the amplitudes of a quantum state. Especially when trying to reproduce neural network-like dynamics one would like to perform nonlinear transformations on the data. But while linear transformations are natural for quantum theory, nonlinearities are difficult to design in this context. A quantum feature map encodes classical inputs into quantum states, embedding the data in a high-dimensional Hilbert space. The feature map approach ‘outsources’ the nonlinearity into the procedure of encoding inputs into a quantum state and therefore offers an elegant solution to the problem of nonlinearities in amplitude encoding.
There are two main classes for amplitude encoding: coherent and incoherent encoding. The coherent encoding version of this is used, for example, in nearest neighbor classification algorithms and the incoherent version is more commonly used in quantum neural networks. PennyLane offers a loading routine for amplitude encoding.
Example: (let us encode the same vector from the above example),
In amplitude encoding, we first normalize it to unit length (rounding to three digits here) and pad it with zeros to a dimension of integer logarithm,
Now it can be represented by a quantum state of 2 qubits:
This state would at the same time encode the matrix as:
Qsample encoding has been introduced to associate a real amplitude vector with a classical discrete probability distribution . This is in a sense a hybrid case of basis and amplitude encoding since the information we are interested in is represented by amplitudes, but the N features are encoded in the qubits. An amplitude-efficient quantum algorithm is therefore exponential in the input dimension N, while a qubit-efficient algorithm is polynomial in the input.
Quantum Machine Learning aims to reduce the complexity in terms of either samples (sample complexity) or the number of operations needed to train the model, classify a test vector or generate a novel example of a concept. While considering QML applications where the input is manifestly quantum can provide, in some cases, exponential advantages over their classical analogs in either sample or time complexity. Various quantum encoding techniques like basis encoding, amplitude encoding, qsample encoding are used for converting classical data into quantum states. We conclude that there is no best encoding for quantum computation addressing different problems on current devices. If arithmetic computations shall be performed, a digital encoding (e.g., basis encoding ) may be preferred. To store as much data as possible in a small number of qubits, a compact encoding like amplitude encoding may be the best choice. However, it must also be taken into account that the state preparation for amplitude encoding is costly in terms of operations.