hai.
I deal with VAE
sample_rate=16000
n_mels = 80
n_fft = 512
win_length = None
hop_length = 128
window_fn = torch.hann_window(80)
spectrogram = T.MelSpectrogram(
sample_rate=sample_rate,
n_fft=n_fft,
n_mels=n_mels,
win_length=win_length,
hop_length=hop_length,
win
And in the process, after creating the raw speech, it is necessary to convert it to a mel spectrogram while maintaining the gradient and remove the vAE reconstruction error
However, when inputting, an error occurs if you do it like when converting audio to a spectrum
normalized, onesided, return_complex)
RuntimeError: stft input and window must be on the same device but got self on cuda:0 and window on cpu
I expected to move hann_window to cuda from the following error
However
window_fn=torch.hann_window(80).to("cuda")
If you do like this, another problem will appear and I don’t know how to solve this
\cutevoice\lib\site-packages\torchaudio\transforms\_transforms.py", line 81, in __init__
window = window_fn(self.win_length) if wkwargs is None else window_fn(self.win_length, **wkwargs)
TypeError: 'Tensor' object is not callable