Reconstructing accurate and realistic facial geometry remains a significant challenge in creating digital avatars, often requiring extensive manual effort or struggling with the variability of real-world images. Xin Ming, Yuxuan Han, and Tianyu Huang, from Tsinghua University, alongside Feng Xu, present VGGTFace, a new method that automatically generates topologically consistent facial reconstructions from everyday multi-view images. The team innovatively applies a powerful 3D foundation model, VGGT, and enhances it with a technique to inject crucial topological information, enabling the creation of detailed and accurate facial models. This approach overcomes limitations of existing methods by inheriting VGGT’s strong generalisation ability and expressive power, achieving high-quality reconstructions in a remarkably short time and demonstrating impressive performance on benchmark datasets and real-world images.
NeRSemble Dataset for Robust 3D Face Reconstruction
This research details a novel method for reconstructing faces from multiple images, even in challenging real-world conditions. The approach leverages VGGT, Pixel3DMM, and a topology-aware bundle adjustment technique, called TopBA, and introduces NeRSemble, a challenging benchmark dataset featuring diverse facial expressions to rigorously evaluate the method. Researchers also captured their own in-the-wild dataset using mobile phones to test robustness in realistic conditions. TopBA refines the geometry while preserving the topological consistency of the reconstructed face, ensuring a valid and realistic shape.
Experiments demonstrate the method’s ability to handle motion blur, producing accurate results even with imperfect input images, and outperform existing techniques due to its ability to leverage multi-view information and a powerful underlying model. Visualizations confirm that the reconstructed mesh aligns well with the input images, although the method struggles with subjects wearing glasses due to reflections and transparency. Quantitative comparisons demonstrate state-of-the-art results on both standard datasets and the challenging NeRSemble dataset, with a baseline combining VGGT, Pixel3DMM, and FLAME demonstrating that the full method outperforms simpler approaches. The authors provide an honest discussion of the method’s limitations and identify areas for future work.
Facial Geometry from Images Using VGGT Foundation
This study introduces VGGTFace, a novel method for reconstructing topologically consistent facial geometry from multi-view images captured in everyday settings. Researchers addressed limitations of existing methods, which often require manual intervention or are constrained by the expressiveness of traditional 3D Morphable Models, by leveraging the pretrained weights of the VGGT foundation model to infer pixel-aligned point maps and camera parameters. To inject crucial topological information, the team augmented these point maps with pixel-aligned UV values derived from Pixel3DMM, linking each point to a corresponding position on a template face mesh, and converting the point map into a point cloud with defined topology. Scientists developed a Topology-Aware Bundle Adjustment strategy, constructing a Laplacian energy function within the Bundle Adjustment objective to refine the mesh reconstruction. The entire process achieves high-quality reconstruction in just 10 seconds using 16 input views on a single NVIDIA RTX 4090 GPU, demonstrating significant speed and efficiency, and exhibits impressive generalization to in-the-wild data, opening possibilities for everyday users to create digital avatars from readily captured images.
Topological Facial Reconstruction From Everyday Images
Scientists have developed VGGTFace, a novel method for reconstructing topologically consistent facial geometry from multi-view images captured in everyday settings. The method leverages the VGGT foundation model to infer pixel-aligned point maps and camera parameters from the input images, and augments these point maps with pixel-aligned UV values using Pixel3DMM, establishing correspondence between points and a template face mesh. To fuse the point cloud data from multiple views, the team proposes a Topology-Aware Bundle Adjustment strategy, minimizing re-projection error and leveraging a Laplacian energy function to regularize the process and overcome potential inaccuracies in the input data. Experiments demonstrate that VGGTFace achieves state-of-the-art results on benchmark datasets and exhibits impressive generalization to in-the-wild data, capturing person-specific facial traits with high fidelity. By exploiting the topology of the template mesh, the method effectively regularizes the Bundle Adjustment process, delivering a robust and accurate reconstruction even with noisy input data, and opens the door for everyday users to create digital avatars from standard multi-view images, eliminating the need for specialized equipment or extensive manual intervention.
Topological Reconstruction From Everyday Images Achieved
This research presents a novel method for reconstructing topologically consistent facial geometry from everyday multi-view images, achieving high-quality results in just ten seconds. The team successfully integrates a powerful 3D foundation model, VGGT, with Pixel3DMM to inject crucial topological information, converting predicted data into a point cloud with defined topology, which is then refined using a newly developed Topology-Aware Bundle Adjustment strategy. The resulting method demonstrates state-of-the-art performance on established benchmarks and exhibits impressive generalisation to real-world, in-the-wild data, surpassing the limitations of previous approaches reliant on 3D Morphable Models or complex manual processes. While acknowledging that the input images influence reconstruction quality, the authors highlight the method’s robustness and efficiency as key achievements. Future work could explore the broader application of 3D foundation models to other face-related tasks, potentially addressing the scarcity of available 3D facial data and enabling high-quality reconstruction for a wider range of users and scenarios.
👉 More information
🗞 VGGTFace: Topologically Consistent Facial Geometry Reconstruction in the Wild
🧠 ArXiv: https://arxiv.org/abs/2511.20366
