Different results of Griffin-Lim using torchaudio

ZhangMingxin1997 · November 14, 2023, 3:00pm

I’m trying to transform the spectrogram back to the audio. First I used librosa.griffinlim and it worked well, but it was time-consuming. Therefore I am trying to use torchaudio on GPU to boost the transformation. However I obtained different reconstruction results compared to the librosa.

This is my code:

# Preprocess
data, fs = librosa.load('waveform.wav', sr=44100)
b, a = signal.butter(3, [20 / fs, 1000 / fs], 'bandpass')
data = signal.filtfilt(b, a, data)
plt.plot(data)

# STFT
DMatrix = librosa.stft(data, n_fft=2048, hop_length=int(2048 * 0.1), window='hann')
dbMatrix = librosa.amplitude_to_db(np.abs(DMatrix), ref=np.max)

And I obtained results similar to the original waveform using librosa:

spec = librosa.db_to_amplitude(dbMatrix)
re_wav = librosa.griffinlim(spec, n_iter=100, n_fft=2048, hop_length=int(2048 * 0.1), window='hann')
plt.plot(re_wav)

But when I changed to torchaudio, the result is different.

griffinlim = torchaudio.transforms.GriffinLim(n_fft=2048, n_iter=100, hop_length=int(2048 * 0.1)).to('cuda')
spec = librosa.db_to_amplitude(dbMatrix)
re_wav = griffinlim(torch.tensor(spec).to('cuda'))
plt.plot(re_wav.cpu().detach().numpy())

What am I missing?

ZhangMingxin1997 · November 15, 2023, 4:52pm

https://stackoverflow.com/questions/77479348/different-results-of-griffin-lim-from-librosa-and-torchaudio

Setting power=1 works.