Hi!
I’m working on generative audio model, and I want to what’s the most common way of selecting channel dimension. I saw some people after doing torch.stft or spectrogram, add additional dimension to have (batch,1,freq_bins,length) and work on additional channel. But I felt like it’s better to use freq_bins as channel dimension for more Transformer-like style. (batch, length,freq_bins) → (batch,length,latent channel)
Any thought on this?
Thank you,