A recent study investigated the reliability of Microsoft’s Bing AI copilot, a popular chatbot, in providing accurate drug information to patients. Led by researchers including WA, FK, RM, and HFN, the study analyzed the chatbot’s responses to 10 common patient questions about the top 50 most frequently prescribed drugs in the US outpatient market.
The team used a virtual private network with a New York location to standardize prompts and ensure the generalizability of the chatbot’s answers. They found that the chatbot’s references were often unreliable, and its answers were difficult to read, requiring a college graduate level of education.
The study also assessed the completeness and accuracy of the chatbot’s responses, comparing them to a reference database created by a clinical pharmacist and a physician with expertise in pharmacology. The results raise concerns about the trustworthiness of AI-powered health information sources like Bing’s chatbot.
Study Design and Methods
The study followed the STROBE reporting guideline and was conducted using Microsoft’s Bing AI copilot for the web. The researchers used a virtual private network (VPN) with a New York location to standardize prompts and ensure generalizability of chatbot answers. They deleted web cookies at each time and entered prompts into separate browser windows to avoid any influence from previous browsing activity or chatbot conversations.
Drug Selection and Patient Questions
The study investigated the top 50 most frequently prescribed drugs in the US outpatient market of 2020, including six non-prescription medications. The researchers selected 10 patient questions related to medication information based on literature reviews, patient guidelines, and expert opinions from clinical pharmacists and physicians.
Assessment of References, Readability, Completeness, and Accuracy
The study assessed the reliability of chatbot references by documenting the websites cited in its answers. Readability was evaluated using the Flesch Reading Ease Score, which estimates the educational level required to understand a text. The researchers created a reference database to assess completeness and accuracy of chatbot answers, matching statements from the chatbot with those from the reference database.
Key Findings
While the study’s results are not explicitly stated in this excerpt, it is clear that the researchers aimed to evaluate the reliability and accuracy of Bing’s AI-powered chatbot in providing drug information to patients. The assessment of references, readability, completeness, and accuracy will likely provide insights into the strengths and limitations of the chatbot’s performance.
Implications and Future Directions
This study has significant implications for patient safety and healthcare outcomes. If the chatbot is found to provide inaccurate or incomplete information, it could lead to medication errors, adverse reactions, or other harm to patients. Conversely, if the chatbot demonstrates high accuracy and reliability, it could become a valuable resource for patients seeking drug information.
Future studies should investigate the generalizability of these findings across different AI-powered chatbots, languages, and cultural contexts. Additionally, researchers should explore strategies to improve the performance of these chatbots, such as integrating them with electronic health records or providing additional training data on medication information.
External Link: Click Here For More
