Scientists are tackling the persistent challenge of reconstructing images from incomplete data in coherent diffraction imaging, particularly when dealing with crystals exhibiting significant distortions. Jialun Liu from the London Centre for Nanotechnology, University College London, David Yang from the Condensed Matter Physics and Materials Science Department, Brookhaven National Laboratory, and Ian Robinson from the London Centre for Nanotechnology, University College London, present a novel approach utilising a Fourier Vision Transformer (Fourier ViT) to overcome limitations in traditional phase retrieval methods. This research, conducted in collaboration between the London Centre for Nanotechnology and Brookhaven National Laboratory, introduces an unsupervised learning technique capable of directly solving the multi-domain phase retrieval problem from 2D diffraction intensities. The significance of this work lies in its ability to accurately reconstruct images even in the ‘strong-phase’ regime, where distortions exceed half a lattice spacing, and with increased robustness, offering a substantial advancement for analysing complex materials at the nanoscale.
Scientists have developed a new approach to reconstruct the three-dimensional internal structure of nanocrystals using Bragg coherent diffraction imaging (BCDI). This lensless X-ray technique encounters a significant challenge in the “strong-phase” regime, where distortions within the crystal exceed half a lattice spacing, leading to complex diffraction patterns and unreliable reconstructions. Fourier ViT uniquely couples information across reciprocal space using multiscale Fourier token mixing, effectively capturing global relationships within the diffraction data, differing from conventional convolutional neural networks by allowing long-range dependencies to be captured. Shallow convolutional layers refine the reconstruction locally, providing both broad context and fine detail, complementing the global information processing of the transformer. Validated on synthetic datasets mimicking Voronoi multi-domain crystals with strong-phase contrast and realistic noise, the method demonstrates a substantial improvement in accuracy and robustness. To rigorously validate the Fourier ViT, large-scale synthetic datasets were generated, simulating Voronoi multi-domain crystals exhibiting strong-phase contrast and realistic noise corruptions, allowing for controlled experimentation and assessment of the method’s performance under varying conditions. The research successfully reconstructs domain-resolved phases even with increasing numbers of domains within the crystal structure, achieving the lowest reciprocal-space mismatch, a measure of reconstruction error, compared to existing techniques. Specifically, Fourier ViT achieves a reciprocal-space mismatch of 0.032, demonstrating successful phase retrieval even in scenarios with substantial phase contrast, exceeding half a lattice spacing, where traditional iterative solvers often fail. The model tokenises input diffraction patterns, initially 64×64 pixels, and projects each 4×4 patch into a token with an embedding dimension of 128, resulting in a 16×16 token grid comprising 256 tokens. This patch size ensures that each token captures at least one full fringe period, encoding physically meaningful local information from 4, 1:2, and 1:1 scales. Multi-scale Fourier attention reshapes tokens into feature maps and processes them at three spatial scales, utilising 2D Fast Fourier Transforms (FFTs) to enable global coupling of reciprocal-space information, with learned per-channel frequency responses and spectral gates further refining the signal at each scale. The decoder upsamples the output sequence to 64×64, fusing it with the encoder skip map and a frequency-space On experimental diffraction data from a La2−xCaxMnO4 nanocrystal, Fourier ViT not only matches the performance of a MATLAB iterative benchmark but also exhibits greater resilience to variations in starting conditions, evidenced by a higher success rate in achieving low-reconstruction errors compared to a complex convolutional baseline. The network outputs a complex real-space density on a 64×64 grid, constrained by a fixed real-space support mask, with the predicted amplitude blended with a prior via a scalar schedule and normalised to match the prior’s l2 norm. The loss function combines Pearson correlation coefficient, root mean square χ2, a power weighted χ2 term, and a total variation regulariser, with weights adjusted throughout training to prioritise global pattern correlation and fine-scale intensity agreement. The ability to rapidly and reliably reconstruct complex crystal structures opens new avenues for in situ and operando experiments, allowing researchers to observe dynamic processes within materials as they occur. By addressing the long-standing challenges of strong-phase retrieval, this work paves the way for more efficient and insightful investigations into a wide range of nanoscale materials and phenomena. Scientists are increasingly turning to machine learning to overcome longstanding challenges in materials characterisation, and this development represents a significant step forward in BCDI, particularly when dealing with complex, multi-domain crystals. While the method demonstrates robustness on experimental data, its performance in truly novel or unpredictable material systems needs further validation, and future work will likely focus on reducing computational cost and exploring ways to incorporate prior physical knowledge into the learning process.
👉 More information
🗞 Vision Transformer for Multi-Domain Phase Retrieval in Coherent Diffraction Imaging
🧠 ArXiv: https://arxiv.org/abs/2602.12255
