AI Generates Histopathology Images for Cancer Diagnosis and Research.

PixCell, a diffusion-based generative model trained on a large histopathology dataset, synthesises realistic images applicable to cancer research. These synthetic images facilitate data augmentation, privacy-preserving data sharing, and virtual staining, even inferring molecular marker results from standard H\&E staining. The trained models are publicly available.

The increasing availability of digitised histological slides presents both opportunity and challenge for cancer research. While these datasets facilitate detailed analysis, annotated data remains limited and data sharing is often constrained by regulatory hurdles. Researchers are now exploring generative artificial intelligence models to address these issues, synthesising realistic images to augment existing datasets and potentially infer information not directly observable. A collaborative team, comprising Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Saarthak Kapse, Tarak Nath Nandi, Ravi K Madduri, Prateek Prasanna, Tahsin Kurc, Rajarsi R Gupta, Joel Saltz, and Dimitris Samaras, detail their development of PixCell, a diffusion-based generative foundation model for digital histopathology, in their paper of the same name.

Diffusion Model Generates Synthetic Histology Slides, Addressing Data Challenges in Cancer Research

The increasing digitisation of pathology slides is generating substantial datasets with potential for advancing cancer diagnosis and research. However, limitations in annotated data, alongside data privacy concerns, present significant obstacles. Researchers have developed PixCell, a diffusion-based generative foundation model, to address these challenges. Trained on the PanCan-30M dataset – comprising 69,184 haematoxylin and eosin (H&E)-stained whole slide images (WSIs) representing multiple cancer types – PixCell offers a novel approach to data augmentation and analysis.

Diffusion models function by learning to reverse a process that gradually adds noise to data, ultimately enabling the generation of new, realistic samples. PixCell employs a progressive training strategy and self-supervision – a technique where the model learns from the inherent structure of the data without requiring manual labels – to facilitate large-scale training. This eliminates the need for extensive, costly annotation. The model generates diverse, high-quality images across various cancer types and can serve as a substitute for real data when training other machine learning models.

Synthetic images generated by PixCell offer advantages regarding data sharing. Compared to clinical images, they present fewer regulatory hurdles, fostering collaboration and accelerating research. Researchers validated PixCell’s capabilities through several applications. Mask-guided image generation – where specific areas of the image are targeted for modification – facilitates data augmentation, improving performance on tasks such as cell segmentation – the automated identification of cells within an image – and enhancing the accuracy of diagnostic algorithms.

Furthermore, PixCell infers immunohistochemistry (IHC) staining from H&E images. IHC uses antibodies to detect specific proteins in tissue samples, providing crucial molecular information. PixCell leverages the structural information within the routinely used H&E staining to predict molecular marker expression, potentially reducing the need for costly and time-consuming IHC testing.

The PanCan-30M dataset underpinning PixCell comprises WSIs from a variety of sources, encompassing a broad range of organs and cancer types, including lung, kidney, colon, breast, liver, pancreas, prostate, skin, thyroid, and uterus. Data originates from established resources such as The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project, ensuring comprehensive coverage and generalisability.

The publicly released trained models accelerate research in computational pathology, providing a valuable resource for the scientific community. By addressing the challenges of data scarcity and privacy, PixCell empowers researchers to unlock the full potential of digital pathology and advance cancer research.

👉 More information
🗞 PixCell: A generative foundation model for digital histopathology images
🧠 DOI: https://doi.org/10.48550/arXiv.2506.05127

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

Toyota & ORCA Achieve 80% Compute Time Reduction Using Quantum Reservoir Computing

Toyota & ORCA Achieve 80% Compute Time Reduction Using Quantum Reservoir Computing

January 14, 2026
GlobalFoundries Acquires Synopsys’ Processor IP to Accelerate Physical AI

GlobalFoundries Acquires Synopsys’ Processor IP to Accelerate Physical AI

January 14, 2026
Fujitsu & Toyota Systems Accelerate Automotive Design 20x with Quantum-Inspired AI

Fujitsu & Toyota Systems Accelerate Automotive Design 20x with Quantum-Inspired AI

January 14, 2026