Microsoft Unveils TRELLIS.2 AI for Lightning-Fast 2D-to-3D Transformations

    Microsoft Unveils TRELLIS.2 AI for Lightning-Fast 2D-to-3D Transformations

    Microsoft has unveiled TRELLIS.2-4B, a cutting-edge AI model that transforms a single 2D image into detailed 3D assets with remarkable speed and precision. Developed by a team of researchers including Jianfeng Xiang, Xiaoxue Chen, and Jiaolong Yang, this 4-billion-parameter system uses a flow-matching transformer combined with an innovative sparse voxel approach called O-Voxel to generate complex 3D objects complete with physically based rendering materials.

    What sets TRELLIS.2 apart is its ability to handle tricky geometries that stump older techniques. Traditional methods often falter with open surfaces, irregular shapes, or non-manifold structures, but O-Voxel enables the model to create and reconstruct diverse 3D forms, from fully enclosed objects to those with sharp edges and translucent elements, all while embedding appearance details like opacity right into the geometry.

    The model takes just one input image and outputs a textured mesh ready for use in graphics applications. It supports voxel grid resolutions ranging from 512 cubed up to 1536 cubed, allowing for high-detail results. A key innovation is the sparse 3D variational autoencoder, which compresses even a 1024 cubed asset into about 9,600 latent tokens with minimal loss in quality, thanks to 16 times spatial downsampling.

    Efficiency is a standout feature. On an NVIDIA H100 GPU, generating at 512 cubed resolution takes around three seconds, scaling to 17 seconds for 1024 cubed and about a minute for the highest 1536 cubed level. The O-Voxel format also allows quick conversions between meshes and voxels without optimization steps, making it practical for real-world workflows. Additionally, it can generate textures conditioned on existing 3D shapes and reference images.

    To run TRELLIS.2, users need a Linux system equipped with an NVIDIA GPU boasting at least 24GB of VRAM, such as the A100 or H100. It relies on CUDA Toolkit version 12.4, Conda for dependencies, and Python 3.8 or later. The model is available as a pre-trained foundation, meaning it hasn’t undergone human preference alignment or style-specific tuning, so results mirror the training data’s diversity and may require input tweaks for particular aesthetics.

    While powerful, the system isn’t flawless. Generated meshes might show small holes or slight topological issues, though Microsoft provides post-processing tools like hole-filling scripts for fixes, especially useful for tasks like 3D printing. The team is already iterating to refine these aspects.

    Released under the MIT License, TRELLIS.2 is open for research and experimentation. Developers can access the code on the project’s GitHub repository, with more details on the official project page. The accompanying research paper is available on arXiv for those diving deeper into the technical underpinnings.


    You might also like this video

    Leave a Reply