Music demixing (spectrogram2spectrogram)

deadman · February 25, 2023, 10:44am

Has anyone made spectrogram 2 spectrogram models for music analysis, specifically demixing such as isolating vocals? I’m currently using a Mel spectrogram for the input and output but struggling to get good results. Using a hop length=512,n_fft=2048, no_mels=128. My model is currently a bi directional GRU model with 3 layers and a 256 hidden size.Does anyone know a good model type to use and/or good audio transformations for this project?

Cheers

JuanFMontesinos · February 25, 2023, 9:13pm

It’s called source separation (if isolating all sources) or enhancement if isolating a specific one.
I think you can google about it.

nateanl · April 8, 2023, 9:17pm

Juan’s model is very fascinating

If you only have audio signal for demixing, you can also try the Hybrid Demucs model, which is built in torchaudio: Music Source Separation with Hybrid Demucs — Torchaudio 2.0.1 documentation

There are other tools that can also do music separation, such as

Open-Unmix GitHub - sigsep/open-unmix-pytorch: Open-Unmix - Music Source Separation for PyTorch
AudioShake https://www.audioshake.ai/
lalal.ai https://www.lalal.ai/