The N-Queens problem, a notoriously difficult challenge in computational complexity, has long resisted complete solutions for even moderately sized boards, with verified results currently limited to 26 queens. Guangchao Yao from XiaoPeng Motors and Yali Li now report a significant breakthrough, successfully verifying the solution to the 27-Queens problem using a novel approach to parallel computing on graphics processing units. Their method, based on an iterative depth-first search algorithm and carefully optimised memory access, overcomes previous limitations in computational speed and resource demands, confirming earlier results obtained using specialised hardware. This achievement, completed in just 28. 4 days using eight RTX 5090 GPUs, not only validates existing findings but also dramatically reduces the estimated time required to solve the next unsolved case, the 28-Queens problem, bringing a resolution within reach and offering a substantial advance in tackling this long-standing mathematical puzzle.
The problem of verifying solutions, particularly using FPGA hardware, has proven time-consuming, often requiring approximately one year for completion. Recent studies suggested that verifying a 27-Queens solution would still require approximately 17 months, indicating excessively high demands on both time and computational resources.
Iterative DFS for GPU N-Queens Counting
This research details a highly optimized CUDA-based algorithm for solving the N-Queens problem, focusing on accurately counting the number of solutions. The researchers achieved a new record by solving the 27-Queens problem in 28. 4 days using eight RTX 5090 GPUs. The team redesigned the traditional recursive search algorithm into an iterative version, allowing the entire stack to fit within the GPU’s fast shared memory, significantly reducing memory access times. A crucial optimization involved designing a shared memory access pattern that eliminates bank conflicts, a common performance bottleneck in GPU programming.
By fitting the entire stack within shared memory, the algorithm avoids slower global memory access and leverages the parallel processing capabilities of GPUs. The researchers successfully computed the number of solutions for the 27-Queens problem in 28. 4 days using eight RTX 5090 GPUs, achieving over a 10x speedup compared to the latest CUDA implementation using the same configuration. The solution was achieved faster than previous FPGA-based solutions, and the code is designed to be portable and scalable across various NVIDIA GPU platforms. The researchers have not only achieved a new performance record but also provided valuable insights into optimizing GPU-based algorithms for complex combinatorial problems.
Queens Puzzle Solved With GPU Parallelism
Scientists have achieved a significant breakthrough in solving the computationally challenging N-Queens problem, successfully verifying the solution for the 27-Queens puzzle in just 28. 4 days. This accomplishment utilized eight NVIDIA RTX 5090 GPUs and confirms a previously obtained result. The team’s innovative approach centers on a refined iterative depth-first search algorithm, meticulously optimized for parallel processing on GPU architecture. A key element of their success lies in mapping the algorithm’s stack structure directly to the GPU’s shared memory, minimizing data access times and maximizing computational efficiency.
The research team also implemented sophisticated techniques to avoid bank conflicts, a common performance bottleneck in GPU computing, through carefully designed memory access patterns. These optimizations collectively deliver substantial performance gains, achieving over a 10x speedup on identical hardware configurations and exceeding 26x acceleration when employing eight RTX 5090 GPUs. Beyond verifying the 27-Queens solution, the team has significantly reduced the projected solving time for the next open case, the 28-Queens problem, to approximately 11 months, bringing a long-standing computational challenge within reach.
Queens Solved, 28-Queens Now Achievable
This research team has achieved a significant breakthrough in solving the computationally challenging N-Queens problem, a classic puzzle in computer science. They successfully verified the solution for the 27-Queens problem in just 28. 4 days using eight RTX 5090 GPUs, confirming the accuracy of a prior result. Furthermore, the team substantially reduced the computational burden for tackling the next unsolved case, the 28-Queens problem, estimating a solving time of approximately 11 months, making its resolution realistically achievable. This advancement stems from an innovative parallel computing method implemented on GPU platforms, featuring a depth-first search algorithm optimized for GPU architecture. Key to their success was mapping the algorithm’s stack structure to GPU shared memory and carefully designing memory access patterns to avoid performance bottlenecks. The researchers acknowledge that achieving optimal performance requires pre-placing six rows and employing a non-uniform task partition strategy, and that the number of GPU cores is the primary determinant of performance for this problem.
👉 More information
🗞 High-Performance N-Queens Solver on GPU: Iterative DFS with Zero Bank Conflicts
🧠 ArXiv: https://arxiv.org/abs/2511.12009
