Need help to get torchaudio.Transforms.TimeStretch to work

ForBo7 · February 17, 2023, 6:10am

Hello.

I’m trying to use torchaudio.Transforms.TimeStretch, on a logarithmic spectrogram by following the PyTorch docs, but get the following error:

RuntimeError: The size of tensor a (1025) must match the size of tensor b (201) at non-singleton dimension 1

Full Traceback

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_27/3556733881.py in <module>
      1 str = T.TimeStretch()
      2 rate = 1.2
----> 3 spec_ = str(lg_spec_db, rate)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torchaudio/transforms/_transforms.py in forward(self, complex_specgrams, overriding_rate)
   1100         else:
   1101             rate = overriding_rate
-> 1102         return F.phase_vocoder(complex_specgrams, rate, self.phase_advance)
   1103 
   1104 

/opt/conda/lib/python3.7/site-packages/torchaudio/functional/functional.py in phase_vocoder(complex_specgrams, rate, phase_advance)
    777     norm_1 = complex_specgrams_1.abs()
    778 
--> 779     phase = angle_1 - angle_0 - phase_advance
    780     phase = phase - 2 * math.pi * torch.round(phase / (2 * math.pi))
    781

How I Created the Logarithmic Spectrogram

import torchaudio

wvfrm, sr = torchaudio.load(aud_files[0])

import torchaudio.transfroms as T

trans = T.Spectrogram(n_fft=2048)
spec = trans(wvfrm); spec.shape

spec_db = T.AmplitudeToDB()(spec)
lg_spec_db = (spec_db + torch.abs(spec_db.min())).log1p()

How I Tried to TimeStretch

str = T.TimeStretch()
rate = 1.2
spec_ = str(lg_spec_db, rate)

However, this results in the error I pasted above.

The shape of my spectrogram tensor is torch. Size([1, 1025, 2074]). I’ve tried playing around with the dimensions such as excluding some of them, but with no success.

I’m not quite sure how I should go about correctly doing this.

I’d appreciate any input or suggestions! If anymore information is needed please do let me know!

ptrblck · February 17, 2023, 7:46am

Based on the error message it seem the number of filter banks defined by n_freq in TimeStretch is not correct and you would have to correct it.

ForBo7 · February 17, 2023, 3:05pm

Thanks for the response!

Right, I see. So I fiddled around with n_freq and figured out I have to set it 1025, so it matches the size of the second dimension in the spectrogram tensor.

However, after doing that, I get a complex64 tensor. I can’t use plt.imsave to convert the tensor to an image since it’s complex64, so I tried using torch.view_as_real. Not sure if that’s right/correct, but it does seem to do the trick I think.