Hmrmamba Achieves Robust 3D Human Mesh Recovery with Geometry-Aware Lifting Modules

Researchers are tackling the persistent problem of unrealistic human pose estimation in video, a challenge frequently encountered in 3D Human Mesh Recovery (HMR). Hongjun Chen, Huan Zheng, and Wencheng Han, all from the State Key Laboratory of Internet of Things and Sensing Technology (SKL-IOTSC), CIS, University of Macau, alongside Jianbing Shen et al., present a new framework, HMRMamba, that significantly improves the accuracy and physical plausibility of reconstructed human meshes. This work is notable for its pioneering use of Structured State Models, specifically a dual-scan Mamba architecture, to create a more robust and geometrically-grounded 3D pose estimation process, and for explicitly modelling motion patterns to enhance temporal consistency, even in challenging conditions like occlusion or motion blur. Demonstrating state-of-the-art performance on standard benchmarks, HMRMamba promises to advance applications requiring realistic and reliable 3D human pose analysis.

The research team tackled these inaccuracies, stemming from flawed 3D pose anchors and inadequate modelling of complex spatiotemporal dynamics, by pioneering the use of Structured State Space Models (SSMs) for their efficiency and long-range modelling prowess. This innovative framework is built around two core contributions: a Geometry-Aware Lifting Module and a Motion-guided Reconstruction Network, both designed to create more robust and realistic human mesh reconstructions from video data. The Geometry-Aware Lifting Module employs a novel dual-scan Mamba architecture to directly integrate geometric cues from image features into the 2D-to-3D pose lifting process, generating a highly reliable 3D pose sequence that serves as a stable anchor for subsequent reconstruction stages.

The team achieved this by grounding the pose lifting process with geometric information, creating a robust foundation for accurate 3D pose estimation. This module’s dual-scan mechanism and temporal evolution capabilities enhance its ability to perceive accurate geometry, particularly in challenging scenarios. Subsequently, the Motion-guided Reconstruction Network leverages this stable anchor to explicitly process kinematic patterns over time, injecting crucial temporal awareness into the reconstruction process. By explicitly modelling the temporal dynamics of human motion, the network significantly enhances the coherence and robustness of the final mesh, especially when dealing with occlusions and motion blur.

This approach moves beyond simply predicting pose parameters; it actively understands and incorporates the natural flow of human movement into the 3D mesh creation. Comprehensive evaluations on established benchmarks, 3DPW, MPI-INF-3DHP, and Human3.6M, confirm that HMRMamba establishes a new state-of-the-art in HMR. The study reveals that the framework outperforms existing methods in both reconstruction accuracy and temporal consistency, while simultaneously offering superior computational efficiency. Specifically, the research demonstrates a significant improvement in handling complex motions and occlusions, producing more realistic and physically plausible human meshes. This breakthrough opens new possibilities for applications in human-computer interaction, virtual reality, the metaverse, and robotics, where accurate and temporally coherent 3D human models are essential. The research team tackled the problem of physically implausible results often stemming from inaccurate 3D pose anchors and poor modelling of spatiotemporal dynamics. Their work pioneers the use of Structured State Models (SSMs) to enhance efficiency and long-range modelling capabilities within the HMR pipeline. Central to HMRMamba is the Geometry-Aware Lifting Module, which employs a novel dual-scan Mamba architecture. This module directly integrates geometric cues from image features into the 2D-to-3D pose lifting process, generating a reliable 3D pose sequence functioning as a stable anchor for reconstruction.

Researchers engineered this module to ground the pose estimation in geometric understanding, improving the accuracy of the initial 3D pose estimation. The system then leverages this anchor in the Motion-guided Reconstruction Network, explicitly processing kinematic patterns over time to enhance mesh coherence and robustness. Experiments involved processing image feature sequences and 2D keypoint sequences through the HMRMamba network. The team designed the network to inject temporal awareness, significantly improving performance under conditions of occlusion and motion blur. Specifically, the dual-scan Mamba architecture within the Geometry-Aware Lifting Module processes image features to predict SMPL parameters, 72 pose parameters and 10 shape parameters defining a 6,890-vertex mesh.

The Motion-guided Reconstruction Network then refines this initial mesh by analysing temporal patterns in the video sequence. Evaluations were conducted on the 3DPW, MPI-INF-3DHP, and Human3.6M benchmarks, demonstrating that HMRMamba surpasses existing methods in both reconstruction accuracy and temporal consistency. The research tackles the production of physically implausible results often seen in current HMR systems, which stem from flawed 3D pose anchors and inadequate modelling of spatiotemporal dynamics. Experiments revealed that HMRMamba leverages Structured State Models (SSMs) to achieve both efficiency and long-range modelling capabilities, representing a significant advancement in the field. The framework’s core innovation lies in the Geometry-Aware Lifting Module, featuring a dual-scan Mamba architecture.

This module directly integrates geometric cues from image features into the 2D-to-3D pose lifting process, creating a robust and reliable 3D pose sequence that functions as a stable anchor for reconstruction. Tests confirm the module’s ability to accurately ground the pose lifting process, resulting in a highly dependable foundation for subsequent mesh generation. The team measured a substantial improvement in the stability of the 3D pose sequence compared to existing methods, particularly in challenging scenarios involving occlusion and motion blur. Furthermore, the Motion-guided Reconstruction network utilises this anchor to explicitly process kinematic patterns over time.

By injecting crucial temporal awareness, the system significantly enhances the coherence and robustness of the final mesh. Data shows that this approach effectively addresses the issue of implausible estimations for occluded body parts, a common failing of previous techniques. Researchers recorded improved performance in maintaining physically realistic poses and body sizes, even under complex motion and significant occlusion. Comprehensive evaluations conducted on the 3DPW, MPI-INF-3DHP, and Human3.6M benchmarks confirm that HMRMamba sets a new state-of-the-art standard. The breakthrough delivers superior reconstruction accuracy and temporal consistency while also offering improved computational efficiency. The SMPL model, used within the framework, is a parametric representation of the human body defined by 72 pose parameters and 10 shape parameters, collectively articulating a mesh of 6,890 vertices, all of which are accurately recovered by the new system. The core of this advancement lies in the use of Structured State Models (SSMs), specifically a dual-scan Mamba architecture within a Geometry-Aware Lifting Module, which establishes a robust and geometrically-grounded 3D pose anchor sequence. This anchor is then refined by a Motion-guided Reconstruction Network, explicitly modelling temporal evolution to ensure coherence and plausibility of the reconstructed mesh. Evaluations on benchmark datasets including 3DPW, MPI-INF-3DHP, and Human3.6M demonstrate that HMRMamba achieves state-of-the-art performance in both reconstruction accuracy and temporal consistency, while maintaining computational efficiency with 7.88 GFlops and a 4.3% relative improvement in MPJPE compared to other approaches. The authors acknowledge a marginal increase in computational cost with their STA-Mamba architecture, but highlight the significant accuracy gains achieved. Future work will include public release of the code developed for this research, potentially enabling further investigation and application of this novel framework.

👉 More information
🗞 Towards Geometry-Aware and Motion-Guided Video Human Mesh Recovery
🧠 ArXiv: https://arxiv.org/abs/2601.21376

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

High-Fidelity Superpositions Advance Bose-Einstein Condensate Quantum Computation Techniques

High-Fidelity Superpositions Advance Bose-Einstein Condensate Quantum Computation Techniques

February 1, 2026
Sal Achieves Improved Classification across 10 Benchmarks with Novel Learning

Sal Achieves Improved Classification across 10 Benchmarks with Novel Learning

February 1, 2026
Identifiable Equivariant Networks Achieve Layerwise Equivariance with Group Actions on Inputs

Identifiable Equivariant Networks Achieve Layerwise Equivariance with Group Actions on Inputs

February 1, 2026