Researchers are tackling the challenge of 3D mesh generation with a novel approach that mimics the artistic process of building shapes through extrusion. Thor V. Christiansen and Alba Reinders from the Technical University of Denmark, alongside Karran Pandey and Karan Singh from the University of Toronto, present Text Encoded Extrusion, a text-based representation and method utilising large language models to construct meshes as sequences of face extrusions. This work represents a significant advance as it naturally accommodates arbitrary mesh complexity and guarantees manifold geometry, addressing limitations found in contemporary transformer-based models. Furthermore, the team, including Morten R. Hannemose and J. Andreas Bærentzen, demonstrate the potential for both generating new shapes and editing existing ones by leveraging learnt extrusion sequences, opening avenues for intuitive 3D modelling and manipulation.
This innovative approach utilises a large language model (LLM) to generate 3D meshes, mirroring the artistic process of building meshes through sequential extrusion steps.
By learning these extrusion sequences, the research overcomes limitations of existing methods and naturally supports arbitrary output face counts, producing manifold meshes by design. The core of this breakthrough lies in the ability to decompose complex quadrilateral meshes into fundamental building blocks, constituent loops, and then finetune an LLM to reassemble them through a series of precisely defined extrusions.
This work demonstrates a significant advancement in 3D shape generation, enabling not only the reconstruction of existing meshes but also the synthesis of entirely new shapes and the seamless addition of features to pre-existing models. The learnt extrusion sequences provide a powerful mechanism for mesh editing, offering a level of control previously unavailable in many contemporary techniques.
Unlike recent transformer-based models that often struggle with sharp features and dense triangle meshes, TEE facilitates the creation of compact and well-defined 3D models suitable for demanding applications. The method circumvents the limitations of fixed spatial grids, allowing for greater freedom in vertex positioning and detail.
Researchers trained their model on a library of quadrilateral meshes, effectively teaching the LLM the principles of mesh construction through extrusion. This process allows the model to predict continuous quantities, such as vertex coordinates, without being restricted to predefined spatial grids, a common constraint in other methods.
The resulting system generates connected meshes directly, avoiding the “soup of primitives” often produced by sequential generation techniques. Demonstrations using the DFAUST dataset showcase the generation of realistic upper bodies, highlighting the potential for creating complex and detailed 3D forms. The code is publicly available, alongside new quadrilateral mesh datasets derived from DFAUST and MANO.
Loop Decomposition and Language Model Finetuning for Extrusion Sequence Prediction improves robotic fabrication workflows
Text Encoded Extrusion (TEE) represents a novel approach to 3D mesh generation, utilising sequences of face extrusions rather than traditional polygon lists. The methodology centres on decomposing a library of quadrilateral meshes, each possessing non-self-intersecting face loops, into constituent loops which function as fundamental building blocks.
An LLM is then finetuned on the process of reassembling these meshes through a defined sequence of extrusions. This study innovates by framing mesh construction as a language modelling task, converting extrusion commands into textual representations. The team finetuned an existing large language model, leveraging its capabilities to learn and predict sequences of extrusions necessary to build a 3D mesh.
This circumvents limitations inherent in transformer-based models, specifically the constraints imposed by fixed context windows and difficulties in predicting continuous vertex coordinates. The research demonstrates the ability to reconstruct existing meshes, synthesise entirely new shapes, and augment existing geometry with additional features.
To facilitate this, the team created quadrilateral versions of the DFAUST and MANO datasets, providing a dedicated resource for training and evaluation. This approach naturally supports arbitrary output face counts and guarantees manifold meshes, addressing common issues with iso-contouring methods like Marching Cubes which often produce overly dense triangle meshes. This approach naturally supports arbitrary output face counts and produces manifold meshes, differing from recent transformer-based models that often struggle with these aspects.
The learnt extrusion sequences are also applicable to existing meshes, enabling both editing and generation capabilities. The study decomposed a library of quadrilateral meshes with non-self-intersecting face loops into constituent loops, treating these as fundamental building blocks. An LLM was then finetuned on the steps required to reassemble these meshes through a sequence of extrusions, demonstrating the potential for reconstruction, novel shape synthesis, and feature addition to existing meshes.
This methodology addresses limitations found in implicit surface representations, which often struggle with sharp features and generate overly dense triangle meshes. By converting extrusion commands into text, termed Text Encoded Extrusions, the research circumvents the limitations of transformer models regarding sequence length and continuous quantity prediction.
The framework imposes no limit on the level of detail in the output mesh and guarantees expected connectivity through the enforcement of extrusion operations. This allows for complete freedom in selecting the LLM used for mesh generation and facilitates the addition of features to existing meshes at user-specified regions.
The work introduces a learning-based methodology for generating 3D meshes via sequences of extrusions, employing a large language model for the task. Researchers published their code and released two new datasets consisting of quadrilateral versions of the triangle meshes from the DFAUST and MANO datasets, furthering the availability of resources for this research area. This approach contrasts with methods that predict single primitives at a time, as it inherently guarantees manifold meshes, a significant advantage over existing techniques.
Learned extrusion sequences facilitate manifold mesh generation and manipulation for complex geometries
Scientists have developed a new technique for generating three-dimensional meshes using large language models by representing mesh construction as a series of face extrusions. This approach differs from existing methods that directly construct meshes from primitives, instead learning sequences of extrusions that mimic the process used by artists.
The resulting representation supports arbitrary output face counts and inherently produces valid, manifold meshes, addressing a limitation of recent transformer-based models. This method enables both the reconstruction of existing meshes and the creation of novel shapes, as well as the addition of new features to established designs.
Training involves decomposing quadrilateral meshes into constituent loops and then finetuning a language model to reassemble them through learned extrusion sequences. Demonstrations include successful reconstruction, the synthesis of new shapes, and the modification of existing meshes, exemplified by results on the DFAUST and hand mesh datasets, and the generation of varied models from a diverse mesh database.
Current limitations include support only for meshes with spherical topology and challenges with complex branching extrusion sequences, which can accumulate approximation errors. The framework also struggles with meshes possessing a genus greater than zero due to difficulties in linearly ordering the underlying data structure.
Future research may focus on extending the method to handle more complex topologies and longer, branching extrusion sequences, potentially improving the robustness and versatility of the approach. This work establishes a pathway for mesh generation and editing driven by language models, prioritising the number of mesh features over the number of faces and offering a valuable tool for designers.
👉 More information
🗞 Learning to Build Shapes by Extrusion
🧠 ArXiv: https://arxiv.org/abs/2601.22858
