NVIDIA Unveils Cosmos Policy for Superior Robot Manipulation and Planning

    NVIDIA Unveils Cosmos Policy for Superior Robot Manipulation and Planning

    NVIDIA has unveiled Cosmos Policy, a cutting-edge advancement in robot control and planning that leverages its world foundation models to enhance performance in complex manipulation tasks. The company is broadening the scope of its NVIDIA Cosmos platform, designed to address challenges in robotics, autonomous vehicle development, and industrial vision artificial intelligence.

    Cosmos Policy builds on the Cosmos Predict-2 model by fine-tuning it specifically for robotic manipulation. This approach integrates robot actions, future states, and success indicators directly into the model’s framework, treating them as additional layers akin to video frames. By employing a diffusion-based process, the model draws on its inherent knowledge of physical dynamics, such as gravity and scene progression, to enable unified capabilities in visuomotor control, world modeling, and value prediction for planning.

    Unlike traditional methods that rely on separate networks for perception and action or vision-language models limited to high-level instructions, Cosmos Policy uses a single, pretrained backbone to generate precise actions alongside anticipated outcomes. This streamlined design allows deployment either as a simpler action generator or in a planning mode that assesses multiple action paths for optimal results.

    Testing on established benchmarks like LIBERO and RoboCasa revealed Cosmos Policy’s superior results. On LIBERO, it achieved an average success rate of 98.5 percent across various task categories, surpassing competitors such as OpenVLA-OFT at 97.1 percent and CogVLA at 97.4 percent. In RoboCasa simulations, with just 50 training demonstrations per task, it reached 67.1 percent success, outperforming baselines like Video Policy’s 66.0 percent despite using fewer examples. Real-world trials on the ALOHA robot platform demonstrated its effectiveness in bimanual tasks, with planning enhancements boosting completion rates by 12.5 percent on average.

    These gains underscore the value of initializing from video-pretrained models like Cosmos Predict, which provide a strong foundation for handling temporal and multimodal data essential to robotics. For more details on the methodology and findings, refer to the Cosmos Policy research page.

    In tandem with this release, NVIDIA launched the Cosmos Cookoff, an open hackathon encouraging developers to experiment with Cosmos models in physical AI applications spanning robotics, autonomous vehicles, and video analysis. Running from January 29 to February 26, the event supports teams of up to four and offers prizes including a $5,000 cash award, an NVIDIA DGX Spark system, and an NVIDIA GeForce RTX 5090 GPU. Entries will be judged by specialists from Datature, Hugging Face, Nebius, Nexar, and NVIDIA. Participants can register at the Cosmos Cookoff site.

    Models and code for Cosmos Policy are openly available, promoting further innovation in the field through resources like the Cosmos Cookbook for practical implementation.


    You might also like this video

    Leave a Reply