AI Predicts Cancer Gene Activity from Tissue Samples across All Cancer Types

Predicting spatial transcriptomics from histology images promises to revolutionise cancer research, but existing techniques struggle with the complexity of applying single-tissue models across diverse cancer types. Susu Hu from the National Center for Tumor Diseases (NCT/UCC) Dresden, Stefanie Speidel from the same institution, and colleagues, address this limitation by presenting MoLF (Mixture-of-Latent-Flow), a novel generative model for pan-cancer histogenomic prediction. This research is significant because MoLF effectively integrates data from multiple cancer types, overcoming the challenges of heterogeneity through a dynamic, specialised network architecture. Their experiments demonstrate MoLF achieves state-of-the-art performance and, crucially, exhibits zero-shot generalisation to cross-species data, indicating the discovery of fundamental and conserved molecular mechanisms underlying tissue structure.

Predicting pan-cancer spatial gene expression using a Mixture-of-Latent-Flow generative model reveals complex tumor microenvironment patterns

Scientists have developed a new generative model, MoLF (Mixture-of-Latent-Flow), that accurately predicts spatial gene expression from standard histology images across a diverse range of cancers. This breakthrough addresses a critical limitation in current histogenomic profiling methods, which are typically restricted to single tissue types and fail to capitalise on shared biological principles between cancers.
The research establishes a new state-of-the-art performance, consistently exceeding both specialized models and existing foundation models on pan-cancer benchmarks. MoLF overcomes the challenges posed by the morphological heterogeneity of diverse tissues by dynamically optimising distinct tissue patterns, enabling robust and scalable analysis.

The study introduces a novel architecture leveraging a conditional Flow Matching objective to map random noise directly to the gene latent manifold. This manifold is parameterised by a Mixture-of-Experts (MoE) velocity field, effectively decoupling the optimisation of diverse tissue characteristics. By dynamically routing inputs to specialised sub-networks, MoLF avoids the parameter interference that plagues monolithic architectures when applied to pan-cancer data.

Experiments demonstrate that this approach not only improves prediction accuracy but also facilitates zero-shot generalisation to cross-species data, suggesting the model captures fundamental, conserved histo-molecular mechanisms. Researchers structured the latent gene space using a Variational Autoencoder (VAE) and subsequently modelled the conditional distribution via flow matching.

This two-stage framework allows for robust pan-cancer prediction without requiring extensive pre-training resources. The model’s design explicitly mitigates parameter interference, enabling it to capture both shared biological motifs and tissue-specific nuances. This innovation moves beyond deterministic regression and masked prediction approaches, offering a truly generative solution for inferring spatial transcriptomics from readily available histological data.

Furthermore, MoLF’s ability to generalise to cross-species data indicates it has identified core biological principles applicable beyond human cancers. This suggests potential applications in veterinary medicine and comparative oncology, broadening the impact of this technology. The development of MoLF promises to unlock scalable histogenomic analysis, facilitating a deeper understanding of cancer biology and potentially accelerating the development of more effective therapies.

MoLF architecture and conditional generative modelling of histogenomic data enable improved prediction of treatment response

A conditional Flow Matching objective underpins the MoLF architecture, mapping random noise directly to the gene latent manifold. This process establishes a generative framework for pan-cancer histogenomic prediction, moving beyond deterministic regression approaches that assume a one-to-one relationship between morphology and gene expression.

The study implemented a Variational Autoencoder (VAE) to structure the latent gene space, effectively reducing dimensionality and enabling more efficient modelling of complex gene expression patterns. Following latent space construction, MoLF employs a Mixture-of-Experts (MoE) velocity field to parameterize the conditional distribution.

This MoE dynamically routes input histological data to specialized sub-networks, or ‘experts’, each trained to focus on specific tissue patterns. By decoupling the optimization of diverse tissue types, the MoE architecture mitigates parameter interference, a common challenge in pan-cancer modelling where conflicting learning signals can arise from morphological heterogeneity.

Experiments involved training and evaluating MoLF across pan-cancer benchmarks, comparing its performance against both specialized, single-tissue models and existing foundation models. The research team assessed zero-shot generalization capabilities by testing MoLF’s ability to predict gene expression in cross-species data, revealing its capacity to capture conserved histo-molecular mechanisms. Performance was evaluated by comparing predicted spatial transcriptomics data with ground truth data, demonstrating consistent outperformance of baseline models and establishing a new state-of-the-art in pan-cancer histogenomic prediction.

MoLF architecture outperforms baselines in pan-cancer histogenomic prediction and gene latent manifold mapping

Scientists demonstrate a new state-of-the-art approach to pan-cancer histogenomic prediction, achieving a 2-Wasserstein distance of 0.292 on a synthetic eight-Gaussian task. This result, obtained using the Mixture-of-Latent-Flow (MoLF) architecture, surpasses the performance of a Dense Transformer baseline which yielded a distance of 0.315.

The MoLF model effectively decoupled the learning of multi-modal distributions, simplifying the task of mapping noise to the gene latent manifold. Experiments conducted on the HEST-1k pan-cancer benchmark reveal that MoLF outperforms existing specialized and foundation model baselines. The research utilizes a curated gene panel comprising the union of 50 MSigDB Hallmark pathways and the top-50 cancer-specific Highly Variable Genes, establishing a robust evaluation framework.

MoLF achieves superior performance on the Top-50 Highly Variable Genes, with mean and standard deviation reported across two independent training splits. Specifically, the MoLF architecture employed a Top-2 gating strategy with six experts to ensure training stability and maximize expert utilization. This configuration mitigates the risk of routing collapse while providing sufficient capacity to model pan-cancer heterogeneity.

The study demonstrates that MoLF’s conditional flow-matching framework effectively addresses the challenges of spatial transcriptomics prediction, resolving the ill-posed inverse problem where similar histological morphologies can correspond to diverse molecular states. Furthermore, the work exhibits zero-shot generalization to cross-species data, suggesting the capture of fundamental, conserved histo-molecular mechanisms.

Pan-cancer image analysis reveals transferable histo-molecular representations with MoLF, enabling cross-cancer prediction of patient outcomes

Researchers have developed a new generative model, MoLF (Mixture-of-Latent-Flow), for predicting gene expression from histological images across multiple cancer types. This model utilizes a conditional Flow Matching objective and a Mixture-of-Experts architecture to effectively map image data to a gene latent manifold, allowing it to decouple and optimize the analysis of diverse tissue patterns.

Experiments demonstrate that MoLF achieves state-of-the-art performance on pan-cancer benchmarks, surpassing both specialized models and existing foundation models. MoLF’s architecture enables it to learn robust and transferable features by training on a diverse, pan-cancer dataset, rather than being limited by tissue-specific training.

The model also exhibits zero-shot generalization to cross-species data, indicating that it captures fundamental and conserved histo-molecular mechanisms. Ablation studies confirm the importance of the Mixture-of-Experts component, which avoids capacity bottlenecks by routing signals to specialized sub-networks, and demonstrate that while highly variable gene expression is driven by local patch morphology, broader Hallmark Pathway prediction relies on regional tissue architecture.

The authors acknowledge that removing spatial awareness through ablation of positional encoding slightly improved performance on highly variable gene prediction, suggesting these genes are primarily driven by local morphology. However, this negatively impacted the prediction of Hallmark Pathways, which depend on broader tissue architecture. Future research could focus on refining the integration of spatial information to further enhance the model’s performance on structure-dependent pathways and exploring the application of MoLF to other biological domains where histogenomic profiling is valuable.

👉 More information
🗞 MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology
🧠 ArXiv: https://arxiv.org/abs/2602.02282

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Ai’s Inner Workings Revealed by Model Trained on One Billion Data Points

Ai’s Inner Workings Revealed by Model Trained on One Billion Data Points

February 12, 2026
Quantum Computer Optimisation Cuts Circuit Size by 14,024 Gates

Quantum Computing’s Building Blocks Simplified with New Three-Wire Logic Rules

February 12, 2026
Larger AI Models Are Not Always Better at Remembering Facts, Research Reveals

Larger AI Models Are Not Always Better at Remembering Facts, Research Reveals

February 12, 2026