Convert to mel spectrogram while maintaining the generated gradient

skerlet_flandorle · October 26, 2022, 10:58am

hai.
I deal with VAE

sample_rate=16000
n_mels = 80
n_fft = 512
win_length = None
hop_length = 128
window_fn = torch.hann_window(80)

spectrogram = T.MelSpectrogram(
    sample_rate=sample_rate,
    n_fft=n_fft,
    n_mels=n_mels,
    win_length=win_length,
    hop_length=hop_length,
    win

And in the process, after creating the raw speech, it is necessary to convert it to a mel spectrogram while maintaining the gradient and remove the vAE reconstruction error

However, when inputting, an error occurs if you do it like when converting audio to a spectrum

    normalized, onesided, return_complex)
RuntimeError: stft input and window must be on the same device but got self on cuda:0 and window on cpu

I expected to move hann_window to cuda from the following error
However

window_fn=torch.hann_window(80).to("cuda")

If you do like this, another problem will appear and I don’t know how to solve this

\cutevoice\lib\site-packages\torchaudio\transforms\_transforms.py", line 81, in __init__
    window = window_fn(self.win_length) if wkwargs is None else window_fn(self.win_length, **wkwargs)
TypeError: 'Tensor' object is not callable