Meta has unveiled Segment Anything Model 3 (SAM 3), a unified foundational model that finally brings robust, open-vocabulary language understanding to visual segmentation and tracking.
The original Segment Anything Model (SAM) was a watershed moment for computer vision, allowing users to instantly mask any object in an image using simple visual prompts like points or bounding boxes. Now, Meta is pushing the technology into the realm of true multimodal understanding with Segment Anything Model 3 (SAM 3), a unified system for detection, segmentation, and tracking that responds directly to complex text prompts.
