Music demixing (spectrogram2spectrogram)

Has anyone made spectrogram 2 spectrogram models for music analysis, specifically demixing such as isolating vocals? I’m currently using a Mel spectrogram for the input and output but struggling to get good results. Using a hop length=512,n_fft=2048, no_mels=128. My model is currently a bi directional GRU model with 3 layers and a 256 hidden size.Does anyone know a good model type to use and/or good audio transformations for this project?


It’s called source separation (if isolating all sources) or enhancement if isolating a specific one.
I think you can google about it.

Juan’s model is very fascinating :blush:

If you only have audio signal for demixing, you can also try the Hybrid Demucs model, which is built in torchaudio: Music Source Separation with Hybrid Demucs — Torchaudio 2.0.1 documentation

There are other tools that can also do music separation, such as