Different results of Griffin-Lim using torchaudio

I’m trying to transform the spectrogram back to the audio. First I used librosa.griffinlim and it worked well, but it was time-consuming. Therefore I am trying to use torchaudio on GPU to boost the transformation. However I obtained different reconstruction results compared to the librosa.

This is my code:

# Preprocess
data, fs = librosa.load('waveform.wav', sr=44100)
b, a = signal.butter(3, [20 / fs, 1000 / fs], 'bandpass')
data = signal.filtfilt(b, a, data)

DMatrix = librosa.stft(data, n_fft=2048, hop_length=int(2048 * 0.1), window='hann')
dbMatrix = librosa.amplitude_to_db(np.abs(DMatrix), ref=np.max)

And I obtained results similar to the original waveform using librosa:

spec = librosa.db_to_amplitude(dbMatrix)
re_wav = librosa.griffinlim(spec, n_iter=100, n_fft=2048, hop_length=int(2048 * 0.1), window='hann')

But when I changed to torchaudio, the result is different.

griffinlim = torchaudio.transforms.GriffinLim(n_fft=2048, n_iter=100, hop_length=int(2048 * 0.1)).to('cuda')
spec = librosa.db_to_amplitude(dbMatrix)
re_wav = griffinlim(torch.tensor(spec).to('cuda'))

What am I missing?


Setting power=1 works.