Meta Launches SAM 3 AI to Revolutionize Object Detection in Images and Videos

    Meta Launches SAM 3 AI to Revolutionize Object Detection in Images and Videos

    Meta has launched Segment Anything Model 3, or SAM 3, a versatile artificial intelligence system designed to identify, outline and follow objects in photos and videos through text descriptions, example images or visual cues like points and boxes.

    The company released the model’s core components, including trained weights, testing materials and instructions for customization, allowing researchers and developers to adapt it freely. This update builds on earlier versions by handling a wider array of prompts, making it easier to target specific items such as a particular umbrella or person in motion.

    To help users explore its potential, Meta introduced the Segment Anything Playground, an online tool where individuals can upload media and apply effects without needing advanced skills. The platform includes ready-to-use templates for tasks like blurring faces or adding visual enhancements to videos.

    SAM 3 will soon power fresh features in Instagram’s Edits video tool, letting creators add targeted effects to elements in their clips with a single action. Similar capabilities are headed to the Vibes section of the Meta AI mobile application and the meta.ai website, where users can generate and edit AI-created videos.

    In a related development, Meta unveiled SAM 3D, a collection of open tools and data for reconstructing three-dimensional objects and human forms from a single photo. This system aims to enhance real-world spatial understanding, with details available in a dedicated company blog post.

    Both SAM 3 and SAM 3D support a new visualization option on Facebook Marketplace called View in Room, which lets shoppers see how furniture like lamps or tables might look in their own environments prior to buying.

    Meta partnered with Conservation X Labs and Osa Conservation to create SA-FARI, the first public video collection for wildlife tracking. It features thousands of camera trap recordings from over 100 species, marked with outlines for every animal appearance, to aid conservation efforts.

    The model addresses gaps in prior systems by processing open-ended text prompts, such as short phrases describing uncommon items, and performs twice as well on a new evaluation set called Segment Anything with Concepts. This benchmark tests recognition across thousands of ideas in diverse settings.

    To train SAM 3, Meta developed an efficient process combining AI automation with human oversight, speeding up labeling by up to five times for absent concepts and over a third for detailed ones. The result is a vast dataset covering more than four million distinct ideas.

    Drawing from Meta’s earlier technologies like the Perception Encoder and DETR framework, SAM 3 processes single images in under a second on high-end hardware and handles videos in near real time for a handful of objects. When linked with advanced language models, it tackles intricate queries requiring logical steps.

    In scientific applications, the technology aids marine studies through FathomNet, a database now enriched with underwater segmentation tools. Researchers can use these resources to monitor ocean life more effectively.

    Meta encourages the developer community to refine SAM 3 for specialized needs, such as medical imaging, via included customization guides and tools from partner Roboflow. Future enhancements may better handle crowded scenes or extended descriptions when integrated with other AI systems.

    The Playground also incorporates sample videos from Meta’s Aria Gen 2 research eyewear, showcasing the model’s strength in first-person views for fields like robotics and environmental awareness.


    You might also like this video

    Leave a Reply