Tumour Mapping Breakthrough Reveals Hidden Cell Patterns with 96% Clarity

Researchers are tackling the challenge of analysing extremely high-dimensional data generated by emerging spatial transcriptomics platforms like Xenium, which simultaneously capture molecular information and cellular context. Md Ishtyaq Mahmud from the University of Houston, Veena Kochat and Suresh Satpati from MD Anderson Cancer Center, et al., present a novel approach, hSNMF, a hybrid spatially regularized nonnegative matrix factorization method, designed to improve representation learning and clustering of these complex datasets. This work significantly advances the field by demonstrating markedly improved performance in both computational metrics, such as cluster compactness and separability, and biological coherence when applied to cholangiocarcinoma data, offering a powerful new tool for understanding tissue architecture and gene expression patterns.

Spatial dimensionality reduction techniques for single-cell transcriptomic data aim to visualize and analyze high-dimensional data in fewer dimensions

Researchers have developed new computational methods to unlock the full potential of high-resolution spatial transcriptomics data generated by platforms like Xenium. These platforms capture detailed molecular information alongside spatial context at the single-cell level, but the resulting data’s high dimensionality presents significant analytical challenges.

This work introduces two novel approaches, Spatial NMF (SNMF) and Hybrid Spatial NMF (hSNMF), designed to improve representation learning and clustering of these complex datasets. Both methods build upon nonnegative matrix factorization, a technique commonly used in genomics, by incorporating spatial information to enhance the accuracy and biological relevance of the resulting analyses.
SNMF establishes a baseline by smoothing each cell’s NMF factor vector based on its spatial neighborhood, effectively integrating local spatial information into the dimensionality reduction process. Building on this foundation, hSNMF further refines the analysis by combining spatial proximity, determined through a contact-radius graph, with transcriptomic similarity via a tunable mixing parameter.

This hybrid approach allows for joint optimization of both geometric contiguity and molecular coherence, promising a more comprehensive understanding of tissue organization. Evaluated on a cholangiocarcinoma dataset, SNMF and hSNMF demonstrate substantial improvements over existing methods. Specifically, the research reveals markedly improved spatial compactness, with CHAOS values below 0.004 and Moran’s I exceeding 0.96.

Furthermore, the new methods achieve greater cluster separability, evidenced by Silhouette scores greater than 0.12 and DBI values less than 1.8, alongside enhanced biological coherence as measured by CMC and enrichment analyses. These results suggest that hSNMF and SNMF offer scalable, interpretable, and parameter-efficient baselines for analyzing high-dimensional gene expression data and uncovering spatial organization within tissues. The implementation is publicly available at https://github.com/ishtyaqmahmud/hSNMF.

Xenium spatial transcriptomics data processing and spatial graph construction require specialized bioinformatics pipelines

A 480-gene target panel on the Xenium spatial transcriptomics platform generated data from 25 cholangiocarcinoma patients, comprising a total of 40 tumor microarray cores and approximately 212,000 cells. High-resolution tissue images were processed to create single-cell expression matrices alongside spatial coordinates for each cell.

Initial quality control removed genes detected in fewer than three cells and cells with fewer than 200 detected genes, including negative-control probes. Doublet detection was performed using Scrublet, excluding cells with doublet scores exceeding 0.2, resulting in 191,125 high-confidence single-cell profiles for subsequent analysis.

Counts were normalized per cell to 10,000 total counts and log-transformed using the formula loge(x+1). Spatial encoding utilized each cell’s centroid coordinates extracted from Xenium metadata to construct spatial graphs. Specifically, both a short-range contact graph with a 20μm radius and a broader radius graph with an 80μm radius were created, ultimately combined to form a hybrid adjacency for spatial smoothing and clustering procedures.

This hybrid adjacency integrates both local proximity and broader contextual connections within the tissue structure. Spatial Nonnegative Matrix Factorization (SNMF) was implemented as a lightweight baseline, enhancing standard NMF embeddings through spatial smoothing of cell factors over local neighbourhoods.

Hybrid Spatial NMF (hSNMF) then performed spatially regularized NMF followed by Leiden clustering on the hybrid adjacency, jointly optimizing geometric contiguity and molecular coherence. Performance was benchmarked against Randomized Spatial PCA, a spatially aware dimensionality reduction method optimized for large-scale spatial transcriptomics data utilizing a randomized two-stage PCA framework with sparse matrix operations. These methods allowed for a comparative analysis of spatial dimensionality reduction techniques and their ability to capture tissue organization in high-dimensional gene expression data.

Spatial embedding quality assessed via compactness, autocorrelation and cluster separation reveals meaningful patterns

Spatial nonnegative matrix factorization (NMF) methods, SNMF and hSNMF, demonstrate markedly improved compactness with CHAOS values consistently below 0.004. Moran’s I values exceeded 0.96, confirming strong spatial autocorrelation within the embeddings generated by these models. Cluster separability was also significantly enhanced, as evidenced by Silhouette scores greater than 0.12 and Davies, Bouldin indices (DBI) less than 1.8.

These quantitative metrics collectively indicate that SNMF and hSNMF effectively capture and preserve the spatial organization of cells within the analyzed cholangiocarcinoma dataset. Pareto-front analysis revealed that SNMF and hSNMF consistently achieve high spatial compactness and autocorrelation across a range of latent dimensionalities and Leiden clustering resolutions.

Specifically, the optimal configurations for these methods yielded CHAOS values below 0.004 and Moran’s I values between 0.96 and 0.98. SNMF and hSNMF also produced positive Silhouette values ranging from 0.15 to 0.27, alongside moderate Davies, Bouldin indices, signifying well-separated and cohesive clusters.

The hybrid hSNMF formulation further improved performance, attaining both higher Silhouette scores and enrichment while maintaining low marker exclusion rates and high cluster marker coherence. UMAP projections visually confirm these findings, with hSNMF generating compact, well-separated clusters possessing smooth boundaries.

In contrast, the non-spatial baseline, NSF, produced elongated and overlapping clusters lacking spatial organization, consistent with its low Silhouette scores and negative Moran’s I values. RASP, another baseline, exhibited modest geometric performance but weaker spatial consistency, as reflected in its lower Moran’s I.

Spatial Transcriptomics Analysis via Integrated Factor Diffusion and Graph Construction reveals nuanced cellular organization

Scientists have developed two new methods, Spatial NMF (SNMF) and Hybrid Spatial NMF (hSNMF), to improve the analysis of high-dimensional single-cell data generated by platforms like Xenium. These techniques extend nonnegative matrix factorization by incorporating spatial information, addressing a key challenge in representing and clustering data from spatially resolved transcriptomics.

SNMF introduces spatial smoothness by diffusing each cell’s factor vector across its neighbourhood, while hSNMF integrates both spatial proximity and transcriptomic similarity using a hybrid graph approach. Evaluations using a cholangiocarcinoma dataset demonstrate that both SNMF and hSNMF significantly enhance spatial compactness and biological consistency when compared to existing non-spatial methods.

Specifically, the algorithms achieve improved cluster separation, as indicated by metrics such as Silhouette scores and the Davies-Bouldin index, alongside enhanced biological coherence through measures like the Cluster Membership Coefficient and enrichment analysis. The hybrid formulation of hSNMF appears to provide the most balanced integration of geometric and molecular information, resulting in spatially cohesive and biologically meaningful clusters.

The authors acknowledge that their framework currently utilizes single-scale spatial graphs and plan to extend this work by incorporating multi-scale graphs for a more comprehensive analysis. Future research will also involve comparisons against deep generative spatial models to further validate the performance and capabilities of these new methods.

👉 More information
🗞 hSNMF: Hybrid Spatially Regularized NMF for Image-Derived Spatial Transcriptomics
🧠 ArXiv: https://arxiv.org/abs/2602.02638

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Quantum-Enhanced AI Boosts Accuracy by 10.22 Per Cent with Privacy Safeguards

Quantum-Enhanced AI Boosts Accuracy by 10.22 Per Cent with Privacy Safeguards

February 11, 2026
Novel Material Reveals Hidden Quantum States Defying Standard Physics Calculations

Novel Material Reveals Hidden Quantum States Defying Standard Physics Calculations

February 11, 2026
Electromagnetic ‘fingerprints’ Unlocked to Improve Biosensors and Invisibility Cloaks

Electromagnetic ‘fingerprints’ Unlocked to Improve Biosensors and Invisibility Cloaks

February 11, 2026