A new quantum convolutional neural network layer construction precisely aligns with pixel cyclic shift symmetry, a key aspect of data encoding methods like frequency-domain representations, according to Dmitry Chirkov and Igor Lobanov of the ITMO University. The approach formalises a mismatch between existing quantum convolutional neural networks and pixel shifts, introducing a deep network architecture utilising Fourier multiplexers and inverse quantum Fourier transforms. Theoretical evidence suggests this design avoids barren plateaus, a common problem hindering the training of deep quantum circuits.
Pixel cyclic shift equivariant layers unlock substantial gains in quantum image recognition
A mean test accuracy of 79.26% was recorded on the translated-MNIST benchmark, representing a 37.04% improvement over a random baseline of 42.22%. Previous quantum convolutional neural networks (QCNNs) struggled to align with the symmetries inherent in image data, creating a gap between quantum circuit design and classical image processing techniques, and this work addresses that challenge. Dr. Joseph Bowles and Dr. Patrick Coles, alongside Dr. Sayak Sen, formally characterised all pixel cyclic shift (PCS)-equivariant unitaries. Their research revealed that any such layer can be constructed using a quantum Fourier transform, a ‘Fourier-mode multiplexer’, and an inverse quantum Fourier transform, avoiding the common problem of barren plateaus and enabling the creation of deeper, more effective quantum networks for image recognition. The significance of this lies in the ability to leverage the inherent symmetries of image data within the quantum realm, potentially leading to more efficient and robust quantum image recognition systems.
The team recorded an improved performance by testing on a translated version of the MNIST dataset, where digits were resized to 16×16 and positioned on a 32×32 canvas. This configuration demonstrated a clear advantage for convolutional models, achieving 97.89% accuracy with a classical convolutional neural network, compared to 48.93% with a multilayer perceptron. This stark contrast highlights the importance of incorporating translational symmetry into the network architecture. The use of a 32×32 canvas with 16×16 digits introduces a translation component that convolutional networks, both classical and now quantum, are well-suited to handle. The team also investigated the impact of finite measurement shots, a practical limitation of quantum hardware, reporting both ideal, infinite-shot accuracy and performance under realistic sampling constraints. This is crucial because current quantum computers are noisy intermediate-scale quantum (NISQ) devices, meaning they have limited qubit counts and are susceptible to errors. Prolonged training with infinite shots can, however, reduce accuracy at lower shot budgets, demonstrating a trade-off between training time and the ability to perform measurements efficiently. Analysis of gradient behaviour revealed a lower bound on expected squared gradient norm that remains constant as the network depth increases, suggesting the architecture avoids a depth-induced reduction in gradient magnitude. This is a critical finding, as vanishing gradients are a major obstacle in training deep neural networks, both classical and quantum. In statevector simulations, the translated-MNIST benchmark showed a higher final mean test accuracy for the PCS-QCNN over a matched random-basis control, with results of 79.26% versus 42.22%. The use of a random-basis control provides a baseline for comparison, ensuring that the observed improvement is due to the specific design of the PCS-QCNN and not simply random chance.
Achieving translation equivariance in quantum convolutional neural networks for image processing
Quantum convolutional neural networks, or QCNNs, are being developed to replicate the success of classical counterparts in image recognition. These networks require careful design to align with how image data is encoded, a process known as translation equivariance, ensuring the network still ‘sees’ an image even if its pixels are shifted around. Translation equivariance is a fundamental property in image processing, meaning that if an object moves within an image, the network should still recognise it. This is achieved by designing the network to respond to features regardless of their location. Establishing these foundational principles allows for more focused development, even before fully optimised quantum hardware is available, and paves the way for practical applications. Dr. Bowles, Dr. Coles, and Dr. Sen fully characterised all pixel cyclic shift (PCS)-equivariant unitaries, demonstrating that any layer respecting pixel shifts can be constructed using a quantum Fourier transform, followed by a ‘Fourier-mode multiplexer’ and an inverse quantum Fourier transform. The quantum Fourier transform (QFT) is a quantum analogue of the discrete Fourier transform, a widely used algorithm in classical signal processing. The ‘Fourier-mode multiplexer’ acts as a crucial component, allowing the network to process information in the frequency domain, effectively simplifying the handling of shifted images. This transform effectively simplifies the handling of shifted images by moving the processing into the frequency domain, where shifts become phase changes, which are easier to handle. This formal link between quantum circuit design and the symmetries within image data marks a key step for quantum machine learning, potentially enabling more complex network designs by avoiding barren plateaus, which hinders the training of deep quantum circuits. Barren plateaus arise in deep quantum circuits due to the exponential decay of gradient norms as the number of layers increases. This makes it difficult to train the network effectively. By designing layers that maintain a constant lower bound on the gradient norm, this new architecture mitigates the risk of barren plateaus, allowing for the creation of deeper and more powerful QCNNs. The implications of this work extend beyond image recognition; the principles of PCS-equivariance and the use of Fourier transforms could be applied to other types of data with inherent symmetries, such as audio or video.
The researchers successfully characterised all unitaries respecting pixel cyclic shifts in quantum convolutional neural networks. This means they identified how to build network layers that recognise image features regardless of their location, a crucial element for image processing. By utilising the quantum Fourier transform and a ‘Fourier-mode multiplexer’, the team demonstrated a way to simplify image shifts within the network. Furthermore, they proved this new architecture avoids a common training problem in deep quantum circuits, maintaining a consistent gradient norm as the network grows deeper.
👉 More information
🗞 Pixel-Translation-Equivariant Quantum Convolutional Neural Networks via Fourier Multiplexers
🧠 ArXiv: https://arxiv.org/abs/2604.06094
