Meta brings Segment Anything to audio, letting editors pull sounds from video with a click or text prompt

2025-12-29

Summary

Meta has introduced SAM Audio, an AI model that can separate specific sound sources from audio mixes, using commands like text prompts, video clicks, or timestamps. This technology combines visual and audio data to isolate sounds such as voices or instruments, although it currently struggles with distinguishing similar sounds and doesn't yet support audio prompts.

Why This Matters

SAM Audio represents a significant advance in audio editing by integrating visual cues with sound separation, which can streamline and enhance tasks in music production, podcasting, and film editing. By using real-world audio and video data for evaluation, Meta's system promises more practical and precise audio separation than previous models.

How You Can Use This Info

Professionals in media and content creation can leverage SAM Audio to efficiently remove unwanted noise or isolate specific sounds, improving the quality of their audio projects. Additionally, understanding the limitations of current AI models helps set realistic expectations for their use in complex audio editing tasks. You can explore this technology further through Meta's Segment Anything Playground.

Read the full article