NVIDIA is spearheading a collaboration that has yielded Open-H-Embodiment, the first open dataset designed to accelerate the development of physical AI in healthcare robotics. Recognizing that existing healthcare AI has largely focused on interpreting data rather than “doing,” the initiative addresses a critical need for standardized robotics data encompassing embodiment, dynamics, and control. The dataset, built by a community spanning 35 organizations including Johns Hopkins University and Technical University of Munich, comprises 778 hours of training data from surgical robotics, ultrasound, and colonoscopy procedures. Released alongside Open-H-Embodiment are two new permissively open-source models, including GR00T-H, described as the first policy model for surgical robotics tasks, trained on approximately 600 hours of the new data.

Open-H-Embodiment Dataset: Community Collaboration & Data Composition

A new, expansive dataset is challenging the limitations of healthcare robotics by prioritizing physical action over mere perception. Open-H-Embodiment, unveiled by a collaboration spanning 35 organizations, addresses a critical gap in the field; existing datasets largely focus on interpreting signals, neglecting the essential element of physical action and interaction within a healthcare setting. The initiative recognizes that healthcare demands robots capable of performing tasks, requiring data that incorporates embodiment, contact dynamics, and closed-loop control, elements absent in previous standardized resources. The project began with a steering committee including Prof. Axel Krieger (Johns Hopkins), Prof. Nassir Navab (Technical University of Munich), and Dr. Mahdi Azizian (NVIDIA), and quickly grew into a global effort to construct a large-scale dataset for advancing physical AI in healthcare robotics.

Researchers utilized a diverse range of robotic platforms, including commercial systems from CMR Surgical, Rob Surgical, and Tuodao, alongside research robots like dVRK, Franka, and Kuka, ensuring broad applicability. This comprehensive collection is designed to foster the development of AI autonomy and world foundation models, rather than simply being a repository of data. The authors explain that healthcare AI has mainly been perception-based, focusing on models that interpret signals and classify or segment pathology/anatomy, highlighting a shift towards more active, embodied intelligence.

GR00T-H: Vision Language Action Model for Surgical Robotics

The pursuit of truly autonomous surgical robots has long been hampered by a reliance on perception-based AI, systems adept at interpreting images but lacking the capacity to physically act. Existing datasets, lacking synchronized data on robot movement, force, and visual input, proved inadequate for developing the “Physical AI” necessary for these complex tasks. Now, NVIDIA has introduced GR00T-H, a vision-language-action model specifically designed to bridge this gap and represent a significant step toward robotic surgery with greater dexterity and precision. Developers implemented four key architectural choices to overcome these hurdles, including “Unique Embodiment Projectors,” learnable components that normalize action spaces across different robotic systems. The team also employed “State Dropout (100%),” intentionally removing proprioceptive input during operation to create a learned bias, ultimately improving performance in real-world scenarios.

The effectiveness of GR00T-H is already demonstrable; a prototype has successfully completed a full end-to-end suture in the SutureBot benchmark, showcasing robust long-horizon dexterity. This achievement highlights the potential for the model to move beyond simple, pre-programmed motions toward more complex and adaptable surgical procedures. The development of GR00T-H, alongside the Cosmos-H-Surgical-Simulator, signifies a shift toward building foundation models capable of reasoning and adapting within the demanding environment of a surgical setting, allowing robotic systems to skillfully act rather than simply see.

Healthcare AI has mainly been perception-based, focusing on models that interpret signals and classify or segment pathology/anatomy. However, healthcare involves “doing,” making the static, perception-only datasets of the past, which lack embodiment, contact dynamics, and closed-loop control, insufficient.

Cosmos-H-Surgical-Simulator: Physics-Based Synthetic Data Generation

NVIDIA’s Cosmos-H-Surgical-Simulator is addressing a critical bottleneck in healthcare robotics: the creation of realistic training data. Traditional surgical simulators struggle to accurately replicate the complexities of real-world procedures, including the behavior of soft tissue, reflections, and even the presence of blood and smoke, which limits their effectiveness in training AI systems. The simulator, a World Foundation Model (WFM), overcomes these limitations by generating physically plausible surgical video directly from kinematic actions, effectively bridging the gap between simulation and reality. Built upon the NVIDIA Cosmos Predict 2.5 2B foundation, Cosmos-H-Surgical-Simulator implicitly learns tissue deformation and tool interaction from data, offering a significant advantage over conventional methods. This approach allows for the creation of synthetic video-action pairs, augmenting datasets where real-world data is scarce.

The efficiency gains are substantial; the team reported that 600 simulation rollouts took only 40 minutes, a dramatic reduction compared to the two days required for equivalent benchtop experiments. The model’s development involved fine-tuning on the Open-H-Embodiment dataset, encompassing nine robot embodiments and 32 datasets, utilizing 64x A100 GPUs for approximately 10,000 GPU-hours. It operates within a unified 44-dimensional action space, streamlining the training process. According to the developers, the simulator’s capabilities extend beyond mere visual fidelity; it’s designed to function as a physics simulator, learning from data to produce realistic interactions. This advancement is poised to accelerate the development of more robust and reliable AI systems for surgical robotics, potentially enabling increasingly autonomous procedures.

Source: https://huggingface.co/blog/nvidia/physical-ai-for-healthcare-robotics

Tags:

NVIDIA

Dr. Donovan

NVIDIA: Open Dataset Advances Physical AI in Healthcare

Open-H-Embodiment Dataset: Community Collaboration & Data Composition

GR00T-H: Vision Language Action Model for Surgical Robotics

Cosmos-H-Surgical-Simulator: Physics-Based Synthetic Data Generation

Latest Posts by Dr. Donovan:

QTREX Ltd. Targets $72B Quantum Market With 3D Architecture

Czech Republic Launches Quantum Network Connecting Prague, Brno, Ostrava

BrainChip’s CyberNeuro-RT Wins AI Product of the Year