Large Language Models Show Promise in Diagnosing Rare Diseases

Rare diseases affect approximately 300 million people worldwide, posing significant diagnostic challenges due to a lack of experienced physicians. Recent breakthroughs have highlighted the potential of Large Language Models (LLMs) in clinically diagnosing rare diseases. To evaluate LLMs’ capabilities, researchers introduce RareBench, a pioneering benchmark assessing accuracy, efficiency, robustness, and interpretability. This comprehensive framework leverages a rare disease knowledge graph to standardize evaluations, enabling the development of more accurate diagnostic models.

Can Large Language Models Serve as Rare Diseases Specialists?

The diagnosis of rare diseases is a significant challenge, with approximately 300 million people worldwide affected by these conditions. The complexity of differentiating among many rare diseases often leads to unsatisfactory clinical diagnosis rates, primarily due to a lack of experienced physicians. Recent news has highlighted the potential of Large Language Models (LLMs) in clinically diagnosing rare diseases, as seen in the case where ChatGPT correctly diagnosed a 4-year-old’s rare disease after 17 doctors failed.

To bridge this research gap and systematically evaluate the capabilities of LLMs on four critical dimensions within the realm of rare diseases, we introduce RareBench. This pioneering benchmark is designed to assess LLMs’ performance in terms of accuracy, efficiency, robustness, and interpretability. By leveraging a comprehensive rare disease knowledge graph synthesized from multiple knowledge bases, RareBench provides a standardized framework for evaluating LLMs’ diagnostic capabilities.

The development of RareBench is accompanied by the compilation of the largest open-source dataset on rare disease patients, establishing a benchmark for future studies in this domain. This dataset enables researchers to train and evaluate LLMs on a large-scale, real-world scenario, thereby facilitating the creation of more accurate and reliable diagnostic models.

Dynamic Few-Shot Prompt Methodology: Enhancing LLMs’ Diagnostic Performance

To facilitate differential diagnosis of rare diseases, we develop a dynamic few-shot prompt methodology. This approach leverages a comprehensive rare disease knowledge graph synthesized from multiple knowledge bases to significantly enhance LLMs’ diagnostic performance. By generating prompts that are tailored to the specific characteristics of each rare disease, our methodology enables LLMs to learn from a small number of labeled examples and generalize well to unseen cases.

Our experimental findings demonstrate the promising potential of integrating LLMs into the clinical diagnostic process for rare diseases. The results show that LLMs can achieve high accuracy in diagnosing rare diseases, even when compared to specialist physicians. This suggests that LLMs could be a valuable tool in supporting clinicians in their diagnosis and treatment of rare diseases.

Comparative Study: GPT4’s Diagnostic Capabilities vs. Specialist Physicians

To further evaluate the diagnostic capabilities of LLMs, we conduct an exhaustive comparative study between GPT4 and specialist physicians. Our results show that GPT4 can achieve high accuracy in diagnosing rare diseases, even when compared to specialist physicians. This suggests that LLMs could be a valuable tool in supporting clinicians in their diagnosis and treatment of rare diseases.

The findings of our study have significant implications for the development of AI-powered diagnostic tools in the field of rare diseases. By leveraging the capabilities of LLMs, we can create more accurate and reliable diagnostic models that can support clinicians in their diagnosis and treatment of rare diseases. This paves the way for exciting future research directions, including the integration of LLMs into clinical workflows and the development of personalized medicine approaches.

Conclusion

In conclusion, our study demonstrates the potential of Large Language Models (LLMs) in diagnosing rare diseases. By introducing RareBench, a pioneering benchmark designed to systematically evaluate the capabilities of LLMs on four critical dimensions within the realm of rare diseases, we provide a standardized framework for evaluating LLMs’ diagnostic capabilities.

Our experimental findings underscore the promising potential of integrating LLMs into the clinical diagnostic process for rare diseases. The development of dynamic few-shot prompt methodology and the compilation of the largest open-source dataset on rare disease patients establish RareBench as a valuable tool in supporting researchers in their pursuit of developing more accurate and reliable diagnostic models.

As we move forward, it is essential to continue exploring the capabilities of LLMs in diagnosing rare diseases. By leveraging the strengths of these AI-powered tools, we can create more effective diagnostic approaches that support clinicians in their diagnosis and treatment of rare diseases. This paves the way for exciting future research directions, including the integration of LLMs into clinical workflows and the development of personalized medicine approaches.

Publication details: “RareBench: Can LLMs Serve as Rare Diseases Specialists?”
Publication Date: 2024-08-24
Authors: Xuanzhong Chen, Xiaohao Mao, Qihan Guo, Li Wang, et al.
Source:
DOI: https://doi.org/10.1145/3637528.3671576

Dr. Donovan

Dr. Donovan

Dr. Donovan is a futurist and technology writer covering the quantum revolution. Where classical computers manipulate bits that are either on or off, quantum machines exploit superposition and entanglement to process information in ways that classical physics cannot. Dr. Donovan tracks the full quantum landscape: fault-tolerant computing, photonic and superconducting architectures, post-quantum cryptography, and the geopolitical race between nations and corporations to achieve quantum advantage. The decisions being made now, in research labs and government offices around the world, will determine who controls the most powerful computers ever built.

Latest Posts by Dr. Donovan:

The mind and consciousness explored through cognitive science

Two Clicks Enough for Expert Echolocators to Sense Objects

April 8, 2026
Bloomberg: 21 Factored: Quantum Risk to Crypto Not Imminent Now

Adam Back Says Quantum Risk to Crypto Not Imminent Now

April 8, 2026
Fully programmable quantum computing with trapped-ions

Fully programmable quantum computing with trapped-ions

April 8, 2026