Large Language Models Show Promise in Diagnosing Rare Diseases

Rare diseases affect approximately 300 million people worldwide, posing significant diagnostic challenges due to a lack of experienced physicians. Recent breakthroughs have highlighted the potential of Large Language Models (LLMs) in clinically diagnosing rare diseases. To evaluate LLMs’ capabilities, researchers introduce RareBench, a pioneering benchmark assessing accuracy, efficiency, robustness, and interpretability. This comprehensive framework leverages a rare disease knowledge graph to standardize evaluations, enabling the development of more accurate diagnostic models.

Can Large Language Models Serve as Rare Diseases Specialists?

The diagnosis of rare diseases is a significant challenge, with approximately 300 million people worldwide affected by these conditions. The complexity of differentiating among many rare diseases often leads to unsatisfactory clinical diagnosis rates, primarily due to a lack of experienced physicians. Recent news has highlighted the potential of Large Language Models (LLMs) in clinically diagnosing rare diseases, as seen in the case where ChatGPT correctly diagnosed a 4-year-old’s rare disease after 17 doctors failed.

To bridge this research gap and systematically evaluate the capabilities of LLMs on four critical dimensions within the realm of rare diseases, we introduce RareBench. This pioneering benchmark is designed to assess LLMs’ performance in terms of accuracy, efficiency, robustness, and interpretability. By leveraging a comprehensive rare disease knowledge graph synthesized from multiple knowledge bases, RareBench provides a standardized framework for evaluating LLMs’ diagnostic capabilities.

The development of RareBench is accompanied by the compilation of the largest open-source dataset on rare disease patients, establishing a benchmark for future studies in this domain. This dataset enables researchers to train and evaluate LLMs on a large-scale, real-world scenario, thereby facilitating the creation of more accurate and reliable diagnostic models.

Dynamic Few-Shot Prompt Methodology: Enhancing LLMs’ Diagnostic Performance

To facilitate differential diagnosis of rare diseases, we develop a dynamic few-shot prompt methodology. This approach leverages a comprehensive rare disease knowledge graph synthesized from multiple knowledge bases to significantly enhance LLMs’ diagnostic performance. By generating prompts that are tailored to the specific characteristics of each rare disease, our methodology enables LLMs to learn from a small number of labeled examples and generalize well to unseen cases.

Our experimental findings demonstrate the promising potential of integrating LLMs into the clinical diagnostic process for rare diseases. The results show that LLMs can achieve high accuracy in diagnosing rare diseases, even when compared to specialist physicians. This suggests that LLMs could be a valuable tool in supporting clinicians in their diagnosis and treatment of rare diseases.

Comparative Study: GPT4’s Diagnostic Capabilities vs. Specialist Physicians

To further evaluate the diagnostic capabilities of LLMs, we conduct an exhaustive comparative study between GPT4 and specialist physicians. Our results show that GPT4 can achieve high accuracy in diagnosing rare diseases, even when compared to specialist physicians. This suggests that LLMs could be a valuable tool in supporting clinicians in their diagnosis and treatment of rare diseases.

The findings of our study have significant implications for the development of AI-powered diagnostic tools in the field of rare diseases. By leveraging the capabilities of LLMs, we can create more accurate and reliable diagnostic models that can support clinicians in their diagnosis and treatment of rare diseases. This paves the way for exciting future research directions, including the integration of LLMs into clinical workflows and the development of personalized medicine approaches.

Conclusion

In conclusion, our study demonstrates the potential of Large Language Models (LLMs) in diagnosing rare diseases. By introducing RareBench, a pioneering benchmark designed to systematically evaluate the capabilities of LLMs on four critical dimensions within the realm of rare diseases, we provide a standardized framework for evaluating LLMs’ diagnostic capabilities.

Our experimental findings underscore the promising potential of integrating LLMs into the clinical diagnostic process for rare diseases. The development of dynamic few-shot prompt methodology and the compilation of the largest open-source dataset on rare disease patients establish RareBench as a valuable tool in supporting researchers in their pursuit of developing more accurate and reliable diagnostic models.

As we move forward, it is essential to continue exploring the capabilities of LLMs in diagnosing rare diseases. By leveraging the strengths of these AI-powered tools, we can create more effective diagnostic approaches that support clinicians in their diagnosis and treatment of rare diseases. This paves the way for exciting future research directions, including the integration of LLMs into clinical workflows and the development of personalized medicine approaches.

Publication details: “RareBench: Can LLMs Serve as Rare Diseases Specialists?”
Publication Date: 2024-08-24
Authors: Xuanzhong Chen, Xiaohao Mao, Qihan Guo, Li Wang, et al.
Source:
DOI: https://doi.org/10.1145/3637528.3671576

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

IBM Remembers Lou Gerstner, CEO Who Reshaped Company in the 1990s

IBM Remembers Lou Gerstner, CEO Who Reshaped Company in the 1990s

December 29, 2025
Optical Tweezers Scale to 6,100 Qubits with 99.99% Imaging Survival

Optical Tweezers Scale to 6,100 Qubits with 99.99% Imaging Survival

December 28, 2025
Rosatom & Moscow State University Develop 72-Qubit Quantum Computer Prototype

Rosatom & Moscow State University Develop 72-Qubit Quantum Computer Prototype

December 27, 2025