Google DeepMind is focusing its artificial intelligence research on a looming crisis in healthcare: the World Health Organization predicts a shortfall of more than 10 million health workers by 2030. Rather than attempting to replace clinicians, the company is developing an AI co-clinician designed to amplify doctors’ expertise and improve patient care, having already achieved performance matching physicians in text-based simulated medical consultations with its AMIE system. Alan Karthikesalingam, Vivek Natarajan and Pushmeet Kohli explain that the next evolution of healthcare delivery will involve ‘triadic care’ where AI agents can help patients in their care journeys under the clinical authority of their physician. This initiative prioritizes trustworthy, factually grounded assistance, with evaluations demonstrating the AI’s superior performance over existing evidence synthesis tools in head-to-head blind evaluations. In objective analysis of 98 realistic primary care queries, the AI recorded zero critical errors in 97 cases.
AI Co-Clinician Addresses Projected Healthcare Worker Shortages
A predicted 10 million health worker shortfall by 2030 is driving intensive research into artificial intelligence as a potential solution, but Google DeepMind is approaching the challenge with a specific focus on augmentation, not replacement. Researchers hypothesize that AI can act as an additional team member, extending clinicians’ reach while ensuring they retain judgment and control. To ensure reliability, the team adapted the “NOHARM” framework to rigorously test the AI for both “errors of commission” (incorrect information) and “errors of omission” (failure to surface critical information). Beyond simply synthesizing information, DeepMind investigated the AI’s ability to handle complex medication questions, evaluating it on the OpenFDA set of RxQA questions. They observed significant progress in these tests, surpassing other AI systems, particularly when questions were posed in the open-ended manner typical of real-world clinical interactions.
This advancement is crucial, as clinicians require precision in medication guidance, a task that has been historically difficult for AI. The researchers note that medicine isn’t just text; it requires observation, listening, and verbal communication, leading them to explore multimodal capabilities, leveraging Gemini and Project Astra to simulate telemedical calls where AI could assist with diagnosis and management under supervision. While expert physicians still outperformed the AI in identifying “red flags” and guiding critical examinations, the system achieved comparable or exceeding performance to primary care physicians in 68 of the 140 assessed areas in a randomized, interface-blinded, crossover simulation study involving 120 hypothetical telemedical encounters, highlighting substantial promise and pinpointing areas for continued research.
NOHARM Framework Validates AI Evidence Synthesis for Physicians
Google DeepMind’s exploration of artificial intelligence as a co-clinician is directly responding to a projected critical shortage of healthcare professionals; the World Health Organization forecasts more than 10 million fewer health workers will be available globally by 2030, necessitating innovative solutions to maintain care quality. While previous medical AI efforts focused on demonstrating knowledge through examination-style tests, DeepMind has shifted toward validating the technology’s utility as a practical assistant for physicians, culminating in the development of AMIE, an AI system already demonstrating performance comparable to clinicians in text-based simulated consultations and undergoing trials in real-world settings. This focus on augmentation, rather than replacement, is central to their approach. A key element of DeepMind’s validation process has been the adaptation of the “NOHARM” framework, originally designed to assess the safety of new medical interventions, to evaluate the AI’s ability to synthesize clinical evidence.
In head-to-head blind evaluations, physicians consistently preferred the AI co-clinician’s responses to leading evidence synthesis tools. In objective analysis of 98 realistic primary care queries, the system recorded zero critical errors in 97 cases, improving over two AI systems widely used by physicians. The methodology employed a detailed, multi-step process, refining queries with a panel of attending physicians and establishing specific metrics to ensure a rigorous assessment of clinical accuracy. The system demonstrated significant progress, particularly when responding to open-ended queries mirroring real-world clinical scenarios, outperforming other AI models. “For a physician, a tool is useful only if it is trustworthy and factually grounded,” highlighting the importance of reliable evidence synthesis in building clinician confidence. These findings suggest that AI is increasingly capable of mirroring human physician proficiency in clinical reasoning, offering opportunities for continued advancement and integration into healthcare workflows.
We assessed over 140 aspects of consultation skill and found that expert physicians performed better than the AI system overall, particularly in identifying “red flags” and guiding critical physical examinations.
AI Co-Clinician Surpasses Benchmarks on OpenFDA RxQA Queries
Their research has moved beyond simply demonstrating medical knowledge recall, as evidenced by MedPaLM, to achieving performance comparable to physicians in simulated consultations with AMIE, and now extends to testing in practical trial settings. This progression culminated in the development of an “AI co-clinician” intended to function as a collaborative member of the care team, operating under the authority of a physician, a model DeepMind terms “triadic care.” A key focus for the researchers has been establishing trustworthiness, recognizing that a useful tool for physicians must be factually grounded. The methodology involved a detailed, multi-step refinement process, with queries curated from diverse sources and assessed by a panel of attending physicians to ensure clinical accuracy. Beyond synthesizing evidence, DeepMind turned to the challenging benchmark of medication knowledge and reasoning: the OpenFDA set of RxQA questions.
While previous AI systems struggled with these complex queries, particularly when posed in open-ended, real-world language, the AI co-clinician demonstrated “significant progress,” surpassing other models. The researchers noted, “On this more realistic clinical task of open-ended question-answering about medications, AI co-clinician outperforms available models.” This success builds on earlier work showing AI’s ability to achieve physician-level proficiency in aspects of clinical reasoning, opening avenues for further improvement and highlighting the potential for AI to assist clinicians navigating increasingly data-intensive care planning and management.
Medicine has always been a team sport, and AI agents can bring more teammates onto the field: extending clinicians’ reach while ensuring they retain judgment and control.
Multimodal AI Demonstrates Telemedical Skill with Gemini & Astra
Recognizing that AI’s potential remains unrealized without meeting the needs of both clinicians and patients, researchers have focused on augmenting, not replacing, medical expertise, building systems designed to function collaboratively under physician oversight. This approach, termed “triadic care,” envisions AI agents extending clinicians’ reach while preserving their critical judgment. Beyond evidence synthesis, the AI demonstrated progress in complex medication knowledge, surpassing models when responding to open-ended questions mirroring real-world clinical interactions. Further research leveraged Gemini and Astra to test the AI’s ability to engage with patients via live audio and video, simulating telemedical calls; working with physicians at Harvard and Stanford, they conducted a randomized, interface-blinded, crossover simulation study with 20 synthetic clinical scenarios and 10 “patient-actors.” The AI successfully guided patients through physical examinations, correcting inhaler technique and identifying potential rotator cuff injuries. Researchers noted that these high-fidelity simulations more rigorously evaluate that premise, emphasizing the potential for AI as a supportive tool rather than a replacement for clinical judgment, and highlighting the extensive promise and areas for further advancement in medical AI.
While our results show significant improvements for AI systems’ MCQ performance in the openly available (OpenFDA) set of RxQA, clinicians’ needs in the real-world present as open-ended questions rather than a need to identify the correct answer from pre-determined options.
