Music demixing (spectrogram2spectrogram)

Has anyone made spectrogram 2 spectrogram models for music analysis, specifically demixing such as isolating vocals? I’m currently using a Mel spectrogram for the input and output but struggling to get good results. Using a hop length=512,n_fft=2048, no_mels=128. My model is currently a bi directional GRU model with 3 layers and a 256 hidden size.Does anyone know a good model type to use and/or good audio transformations for this project?


It’s called source separation (if isolating all sources) or enhancement if isolating a specific one.
I think you can google about it.