How AI Actually Splits A Song Into Stems
Pulling the vocal, drums and bass out of a finished mix sounds impossible, like un-baking a cake. Here's what's really happening, and why some parts come out clean while others stay a little messy.
Short answer
How does an AI stem splitter work?
An AI stem splitter separates a mixed song into its parts, vocals, drums, bass, guitar and piano, by recognizing each instrument’s fingerprint in a spectrogram and rebuilding every part as its own track. Riffloop runs this separation on your device, on the YouTube song you’re watching or a file you upload, so you can solo or mute any part while you practice.
When you play a song, your speaker cone is doing exactly one thing: moving back and forth along a single path. Every instrument, every voice, the whole band, has already been summed into one blended waveform. There is no hidden "vocal track" tucked inside the file waiting to be extracted. There's just the sum.
So asking software to hand you back the isolated vocal is asking it to un-bake the cake, to look at the finished mix and reconstruct ingredients that were blended together and, strictly speaking, thrown away. That it works at all is genuinely impressive. Understanding how tells you exactly when to trust it.
01The model learns what each part looks like
AI separation is trained on enormous libraries of songs where the isolated tracks are known: the studio has the real vocal, the real drums, the real bass on their own. The network gets shown the full mix alongside those true parts, over and over, until it learns what a vocal, a snare, a bass note tend to look like as a picture of frequencies over time, a spectrogram.
Then, faced with a brand-new song it has never heard, it estimates a kind of stencil for each part, deciding which frequencies at each instant belong to the vocal, which to the drums, and so on, and lifts them out. It isn't recovering the original files. It's making an educated guess, informed by every song it trained on.
02Why some parts come out cleaner
Not every instrument is equally easy to guess, and this is the part worth knowing before you judge a result:
- Vocals and drums separate cleanly. Voices have a distinctive fingerprint and drums are sharp bursts in time, so the model can spot them confidently. On a good mix they come out close to the original.
- Bass and piano come out very well. They occupy fairly predictable ranges, so there's less guessing.
- Synths, strings, and "everything else" are the noisiest. They smear across the same frequencies as the other instruments, so the model can't cleanly decide who owns what. That leftover "other" bucket is the roughest part of any separation, in every tool on the market, not just one.
03The recording matters more than the tool
People blame the software when a stem comes out watery, but the bigger factor is usually the source. A clean, well-mixed studio recording gives the model clear ingredients to pull apart. A phone-recorded live clip, a lo-fi upload, or a heavily-processed track hands it a mix where the sources are already blurred into each other, and no amount of cleverness fully un-blurs them.
So if a result disappoints, try a cleaner version of the same song before you blame the tool. Garbage in doesn't mean garbage out exactly, but it does mean harder out. That's physics, not a bug.
04On your device, or up to a server
There's one more design choice that quietly matters: where the separation runs. Many tools upload your audio to their servers, do the work in the cloud, and send stems back. That means your files, including anything you recorded yourself, leave your machine.
Riffloop runs the separation on your device instead, right on the YouTube video you're watching or a file you upload, and pulls the song into six stems, vocals, drums, bass, guitar, piano and the rest, so nothing is uploaded. For a practice tool you'll point at demos and lesson recordings, keeping the audio local is the difference that lets you not think about it.
Split any song into six stems
Separate a song into vocals, drums, bass, guitar, piano and the rest, then solo or mute each one, on the YouTube video or a file you upload. Free to start, no signup, nothing leaves your device.
05Knowing the guess makes you use it better
Once you stop expecting perfection, stems become an incredibly useful practice tool. Trust the vocal, drum, and bass isolations. Expect the strings-and-synth bucket to be rough and don't try to transcribe fine detail out of it. Feed it the cleanest recording you can find. Do that and you'll get exactly what separation is good for: hearing a single part clearly enough to finally learn it.
It was never magic. It's a very good guess, trained on a mountain of music, running quietly on your laptop. That's more than enough to change how you practice.