Researchers are tackling the challenge of limited generalisation in dexterous manipulation policies, which typically rely on fixed hand designs. Zhenyu Wei, Yunchao Yao, and Mingyu Ding, all from the University of North Carolina at Chapel Hill, present a parameterised canonical representation designed to unify a wide range of dexterous hand architectures. This work introduces a unified parameter space and a canonical URDF format, enabling learning algorithms to effectively condition on morphological and kinematic variations and facilitating smooth transitions between hand designs. Crucially, the standardised action space allows for efficient policy learning across different hand structures, demonstrated through successful grasp policy replay, VAE latent encoding, and zero-shot transfer experiments, achieving an 81.9% success rate on a 3-finger LEAP Hand in simulation and real-world tasks. This framework offers a scalable foundation towards universal dexterous manipulation by unifying both representational and action spaces for structurally diverse hands.

For decades, building a robot hand capable of truly flexible manipulation has remained a major engineering challenge. This unified system promises to unlock more adaptable and capable robotic grippers, bringing us closer to machines that can handle objects as deftly as humans. Scientists introduce a parameterised canonical representation that unifies a broad spectrum of dexterous hand architectures.

This representation addresses limitations encountered when working with robots possessing varied kinematic and structural layouts. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms.

A structured latent manifold can be learned over this space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. The canonical URDF facilitates standardised representation and manipulation of hand designs.

Grasp performance benchmarks and transferability across robotic hand platforms

At 84.20%, the unified grasp success rate across the Allegro hand demonstrates the effectiveness of the canonical representation. Performance on the Barrett hand reached 88.10%, while the Shadow Hand achieved 62.90% success, indicating varying levels of transfer capability across different hand designs. These results, obtained using a lightweight model and a 10-step DDIM sampler, show inference times of only 0.13 seconds, establishing a new benchmark for efficiency.

Comparison against established methods like DFC (76.2%, 86.3%, 58.8% success rates) and GenDexGrasp (51.0%, 67.0%, 54.2% success rates) reveals that the current work achieves comparable grasp success with markedly faster processing. However, discrepancies emerged when transferring policies to the Allegro Hand, with success rates dropping to 71.60% from the original 84.20% when using the canonical URDF.

This reduction stems from the omission of an axial-rotation joint in the canonical URDF representation of the Allegro Hand, creating a structural mismatch. Still, the bidirectional mappings between canonical and original URDF spaces largely preserve action semantics, as evidenced by closely matched success rates during policy transfer. Cumulative Rotation values for the Shadow Hand using the original URDF were 9.09 radians, while the canonical representation yielded 10.92 radians, suggesting comparable rotational control.

Examining in-hand reorientation reveals that Steps-to-Fall for the LEAP hand using the original URDF averaged 397.62, decreasing to 326.98 when employing the canonical representation. This indicates that the canonical parameterization preserves essential manipulation dynamics, allowing for stable grasping and control. Once trained on the unified representation, the VAE captures continuous morphological relationships, as demonstrated by the preservation of thumb placement, degrees of freedom, and other key kinematic parameters.

Beyond this, unified training consistently outperformed embodiment-specific training, with success rates of 84.2%, 88.1%, and 62.9% compared to 82.1%, 87.6%, and 55.4% respectively, highlighting the benefits of shared learning across morphologies. For zero-shot grasping on unseen LEAP Hand variants, the policy achieved an 81.9% success rate, demonstrating strong generalisation capabilities.

Inside the zero-shot evaluation, models conditioned on hand morphology consistently outperformed those without, further validating the representation’s capacity for cross-embodiment generalisation. By learning within a shared action space, hands with differing kinematics benefit from data collected from other hand designs. Under the established evaluation protocol, the model processes grasps in 0.13 seconds, a substantial improvement over the >1800 seconds required by the DFC method.

A unified canonical representation for zero-shot dexterous hand manipulation

Scientists propose a canonical representation for dexterous hands that standardizes diverse morphologies and kinematic structures into a unified parameterised format, enabling consistent and learning-friendly structural encoding across hands. They validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer.

Specifically, researchers train a VAE on the unified representation to obtain a compact, semantically rich latent embedding and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. Through simulation and real-world tasks on unseen morphologies, such as achieving an 81.9% zero-shot generalisation to unseen hand morphologies and strong grasping performance in both simulation and real-world experiments, without additional fine-tuning.

The main contributions are as follows: proposing a canonical representation for dexterous hands that standardizes diverse morphologies and kinematic structures into a unified parameterised format, enabling consistent and learning-friendly structural encoding across hands; extensive experiments including morphology latent interpolation, grasp policy replay, in-hand reorientation, and unseen cross-hand grasping demonstrate that the canonical format not only faithfully preserves functional behaviour but also provides a unified action space for zero-shot and effective policy transfer across embodiments; and establishing an interpretable, interpolable, and scalable representation foundation that enables joint policy training for cross-embodiment dexterous manipulation, paving the way for unified large-scale and morphology-aware learning. High-DoF robotic hands provide the articulation needed for fine-grained contact control and multi-stage manipulation, supporting tasks from stable grasping to active in-hand reconfiguration and tool-mediated interactions.

These capabilities have been extensively explored through analytic models and data-driven learning, producing strong performance on individual embodiments. However, such policies are optimised for the specific kinematics, actuation, and workspace of a single hand, becoming tightly coupled to that embodiment. This embodiment-specific specialisation limits transfer to other hands and prevents the reuse of demonstrations across heterogeneous hardware, leaving progress fragmented across isolated hand designs rather than advancing toward methods that generalise to new embodiments.

Recent work has increasingly explored how to share manipulation abilities across robotic hands with distinct morphologies. Much of this progress has centred on grasping. One line of work focuses on intermediate grasp representations that abstract away embodiment-specific kinematics. These methods use representations such as interaction-centric fields or contact patterns to enable grasp transfer across different robotic hands, but remain largely confined to grasp synthesis and do not naturally extend to sequential manipulation skills.

Beyond grasping, some approaches target more general cross-embodiment behaviours through higher-level or embodiment-agnostic interfaces. Human-centric representations treat the human hand as a universal manipulation prior but typically assume human-like kinematics and require specialised hardware mappings. Particle-based dynamics learning provides an alternative by representing hands and objects as particle systems, yet is mainly applicable to structurally similar hands and constrained manipulation tasks.

Despite these efforts, a unified cross-embodiment model capable of supporting general manipulation across heterogeneous robotic hands is still lacking. To enable learning policies that generalise across dexterous hands with different morphologies, researchers propose a parameterised canonical hand representation, which serves as the foundation for cross-embodiment manipulation.

The goal is to express diverse robotic hands within a unified structural and kinematic framework that can be efficiently processed by learning-based models. They begin by motivating the need for a canonical representation, highlighting the limitations of existing URDF formats and the challenges of defining a consistent and learnable description for heterogeneous dexterous hands.

They then present the design of the canonical URDF, which captures shared human-inspired kinematic structure while enforcing consistent coordinate conventions. Next, they define the canonical parameter set that encodes key morphological and kinematic properties in a compact and interpretable form. They further describe the automatic parsing process that converts arbitrary hand URDFs into this canonical parameterization and generates standardized URDF models. Finally, they establish a unified action space that aligns control dimensions across hands with different degrees of freedom, enabling a single policy to act consistently over diverse embodiments.

A unified parameter space enables transfer learning between diverse robotic hand designs

For years, robotic dexterity has been hampered by a simple truth: hands are not interchangeable. Building a grasping policy for one robotic hand rarely translates to another, even if the tasks appear similar. This work addresses that limitation by creating a common language for hand morphology, allowing algorithms to learn skills applicable across a wider range of designs.

Instead of treating each hand as a unique case, researchers have devised a way to represent them within a unified parameter space. Yet, achieving this universality is not merely a matter of clever coding. Previous attempts often struggled with the loss of detail when simplifying complex hand structures, or failed to account for the subtle interaction between kinematics and dynamics.

Now, a parameterised representation and a standardized data format offer a means of preserving essential properties while enabling meaningful comparisons between different hand designs. As a result, a policy trained on one hand can be transferred, with considerable success, to hands it has never encountered. However, the reliance on simulation remains a significant hurdle.

While real-world tests demonstrate promising zero-shot transfer rates, the gap between simulated and physical environments is well known. Beyond this, the current framework focuses primarily on grasping.

👉 More information
🗞 One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation
🧠 ArXiv: https://arxiv.org/abs/2602.16712

Tags:

canonical representation cross-embodiment transfer Dexterous manipulation grasp policy latent manifold robotic hands. URDF VAE zero-shot learning

Robotic Hands Gain Adaptable Designs for Varied Tasks

Grasp performance benchmarks and transferability across robotic hand platforms

A unified canonical representation for zero-shot dexterous hand manipulation

A unified parameter space enables transfer learning between diverse robotic hand designs

Rohail T.

Latest Posts by Rohail T.:

Lasers Cool Atoms to below 100 nanoKelvin in Space

Long-Range Interactions Aid Superconductivity Modelling

Secure Messaging Gains Stronger Anonymity Guarantees Now