Imbalanced datasets frequently hinder accurate medical image classification, a particularly pressing issue when identifying diseases with varying prevalence, as highlighted by the recent COVID-19 pandemic. Sina Jahromi, Farshid Hajati, Alireza Rezaee, and Javaher Nourian from the University of Tehran and University of New England tackled this problem by developing a novel approach to augment limited medical imaging data. Their research introduces a progressive generative adversarial network to create synthetic images, effectively balancing datasets and improving the performance of diagnostic algorithms. This method combines generated data with real images, then optimises a ResNet classifier using a sophisticated population-based algorithm, ultimately achieving high accuracy in identifying COVID-19 from chest X-rays. The resulting model demonstrates significant improvements over existing techniques when tested on imbalanced datasets, offering a promising solution for more reliable medical diagnoses during outbreaks and beyond.
The research proposes a novel approach combining Progressive Growing of GANs (ProGAN) for data augmentation and a ResNet architecture optimised using the Sparrow-Monkey Algorithm (SMA). This methodology aims to address the scarcity of positive cases, images confirming the presence of the disease, which commonly hinders the performance of deep learning models. Experiments were conducted utilising a dataset of chest X-ray images to evaluate the efficacy of the proposed ProGAN-SMA-ResNet model, demonstrating improved classification accuracy and robustness compared to conventional techniques when dealing with imbalanced medical image data, offering a potential tool for enhanced diagnostic capabilities.
ProGAN Data Augmentation for Imbalanced X-rays
Researchers tackled the challenge of imbalanced datasets in medical image classification, a common problem particularly acute during pandemics when data for certain conditions is scarce. The study pioneered a novel approach employing a progressive generative adversarial network, or ProGAN, to generate synthetic chest X-ray images and augment limited real-world data, directly addressing the bias introduced by unequal class frequencies during training and application of classification models. The team engineered a system where ProGANs were customized for each class, effectively increasing the representation of under-represented conditions within the training dataset. To further refine the classification process, scientists developed a weighted combination strategy, integrating the synthetic images generated by the ProGAN with the original chest X-ray images.
This ensured a more balanced input for the deep network classifier, mitigating the impact of class imbalance on model performance. The study also harnessed the power of a multi-objective, meta-heuristic population-based optimization algorithm, the Slime Mould Algorithm (SMA), to precisely tune the classifier’s hyperparameters, iteratively refining the model’s settings to maximize its predictive capabilities. Experiments employed a large, imbalanced chest X-ray image dataset to rigorously test the proposed model. The ImageNet pre-trained ResNet50V2 model was utilized for classifying images into four categories: Normal, Viral Pneumonia, COVID-19, and Lung Opacity.
The innovative ProGAN-based data augmentation, coupled with SMA-driven hyperparameter optimization, demonstrably improved classification accuracy, achieving 95.5% accuracy in a 4-class imbalanced classification scenario and 98.5% accuracy when distinguishing between just two classes. These successful outcomes validate the effectiveness of the proposed methodology in classifying medical images even with severely imbalanced data, a critical advancement for pandemic response and disease diagnosis. The research demonstrates a significant step forward in addressing data scarcity, enabling more reliable and accurate AI-driven medical image analysis. The combination of synthetic data generation and intelligent optimization provides a robust solution for improving classifier performance in challenging real-world scenarios.
ProGAN Augmentation Improves Imbalanced CXR Classification
Scientists have developed a novel approach to address the persistent challenge of imbalanced data in medical image classification, particularly relevant during pandemics where disease prevalence can skew datasets. The research team proposed a progressive generative adversarial network (ProGAN) to create synthetic data, effectively augmenting limited real-world examples and mitigating bias. This synthetic data was then combined with real chest X-ray (CXR) images using a weighted approach, ensuring a more balanced representation of each class during training. Experiments utilizing a large and imbalanced CXR dataset demonstrated the effectiveness of this method, achieving 95.5% accuracy when classifying images into four distinct classes, and 98.5% accuracy in a two-class imbalanced classification scenario.
The team employed a Slime Mould Algorithm (SMA) to optimize the classifier network’s hyperparameters, further enhancing performance. Measurements confirm the ProGAN’s ability to generate high-quality images that improve classification accuracy without introducing significant network bias. Researchers compiled a comprehensive dataset of 18,479 CXR images, comprising 8,851 normal cases, 6,012 non-Covid lung infections, and 3,616 positive COVID-19 cases, providing a substantial resource for future investigations. Comparative results underscore the breakthrough delivered by the proposed model, offering a powerful tool for medical diagnosis in situations with limited and imbalanced data, surpassing frameworks achieving 94% accuracy, 95% F1-score, 96% precision, and 95% recall, and demonstrating an 18% accuracy increase over non-balanced algorithms.
Synthetic Data Boosts Chest X-ray Classification Accuracy
This study successfully addresses the challenge of imbalanced datasets in medical image classification, a common problem particularly acute during pandemics. Researchers developed a progressive generative adversarial network (ProGAN) to create synthetic chest X-ray images, effectively augmenting limited real-world data. These synthetic images, combined with real images using a weighted approach, were then used to train a ResNet50V2 classifier, with hyper-parameters optimised via a multi-objective heuristic algorithm. The resulting model demonstrated significant improvements in classifying imbalanced datasets of chest X-rays, achieving 95.5% and 98.5% accuracy for four-class and two-class problems respectively.
Experiments revealed that a 20% injection rate of synthetic images, alongside optimised learning rates and dropout, enhanced classification accuracy by 3.53% and reduced the standard deviation, suggesting improved robustness. The work confirms the potential of contemporary ProGANs to generate useful synthetic data for improving medical image classification. The authors acknowledge that the findings are specific to the investigated dataset and ResNet50V2 architecture. Future research should explore the generalizability of this approach across different medical imaging modalities and network architectures, and further investigation into the optimal weighting strategies for combining real and synthetic data could refine the methodology and enhance performance still further.
👉 More information
🗞 Medical Image Classification on Imbalanced Data Using ProGAN and SMA-Optimized ResNet: Application to COVID-19
🧠 ArXiv: https://arxiv.org/abs/2512.24214
