Need help to get torchaudio.Transforms.TimeStretch to work


I’m trying to use torchaudio.Transforms.TimeStretch, on a logarithmic spectrogram by following the PyTorch docs, but get the following error:

RuntimeError: The size of tensor a (1025) must match the size of tensor b (201) at non-singleton dimension 1

Full Traceback
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_27/ in <module>
      1 str = T.TimeStretch()
      2 rate = 1.2
----> 3 spec_ = str(lg_spec_db, rate)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/ in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torchaudio/transforms/ in forward(self, complex_specgrams, overriding_rate)
   1100         else:
   1101             rate = overriding_rate
-> 1102         return F.phase_vocoder(complex_specgrams, rate, self.phase_advance)

/opt/conda/lib/python3.7/site-packages/torchaudio/functional/ in phase_vocoder(complex_specgrams, rate, phase_advance)
    777     norm_1 = complex_specgrams_1.abs()
--> 779     phase = angle_1 - angle_0 - phase_advance
    780     phase = phase - 2 * math.pi * torch.round(phase / (2 * math.pi))

How I Created the Logarithmic Spectrogram

import torchaudio

wvfrm, sr = torchaudio.load(aud_files[0])
import torchaudio.transfroms as T

trans = T.Spectrogram(n_fft=2048)
spec = trans(wvfrm); spec.shape
spec_db = T.AmplitudeToDB()(spec)
lg_spec_db = (spec_db + torch.abs(spec_db.min())).log1p()

How I Tried to TimeStretch

str = T.TimeStretch()
rate = 1.2
spec_ = str(lg_spec_db, rate)

However, this results in the error I pasted above.

The shape of my spectrogram tensor is torch. Size([1, 1025, 2074]). I’ve tried playing around with the dimensions such as excluding some of them, but with no success.

I’m not quite sure how I should go about correctly doing this.

I’d appreciate any input or suggestions! If anymore information is needed please do let me know!

Based on the error message it seem the number of filter banks defined by n_freq in TimeStretch is not correct and you would have to correct it.

Thanks for the response!

Right, I see. So I fiddled around with n_freq and figured out I have to set it 1025, so it matches the size of the second dimension in the spectrogram tensor.

However, after doing that, I get a complex64 tensor. I can’t use plt.imsave to convert the tensor to an image since it’s complex64, so I tried using torch.view_as_real. Not sure if that’s right/correct, but it does seem to do the trick I think.