Gradient of a variable became NaN after first batch

tuan3w · October 20, 2018, 5:58pm

Hi all,
I’m developing a neural vocoder. My loss is based on STFT. However, when I switch to pytorch 0.4.1, the loss became NaN after my first batch. I tried to reduce learning rate, however, the error still happens.
I created a simple code to test it:

import numpy as np
import torch

def cal_spec(signal, n_fft=2048, hop_length=256, win_length=1024):
    window = torch.hann_window(win_length).cuda()
    complex_spectrogram = torch.stft(
            signal, n_fft=n_fft, hop_length=hop_length, win_length=win_length, window=window, center=False)
    power_spectrogram = complex_spectrogram[:, :, :, 0] ** 2 + complex_spectrogram[:, :, :, 1] ** 2
    return torch.sqrt(power_spectrogram)

grads = {}
def add_grad(name, x):
    grads[name] = x

def reg(name):
    return lambda x: add_grad(name, x)


# file can be downloaded from:
# https://drive.google.com/file/d/1qxTIKLcSShBcfX3kIgtf5scJSlQKtrPa/view?usp=sharing
d = np.load('a.npy.npz')
x = torch.tensor(d['pred'], requires_grad=True)
y = torch.tensor(d['target'], requires_grad=False)

pred_spec = cal_spec(x)
target_spec = cal_spec(y)
x.register_hook(reg('x'))
pred_spec.register_hook(reg('pred_spec'))

loss = torch.mean(torch.abs(pred_spec - target_spec))
loss.backward()

After inspecting, I see that the gradient of pred_spec is fine. However, the gradient of x is NaN.
Thanks.

tom · October 20, 2018, 6:49pm

Maybe the non-differentiability of sqrt at 0 causes you trouble? If it does you could add a small constant (but beware, the square root of a small constant isn’t nearly as small) or just drop the square root and operate with the squares.

Best regards

Thomas

tuan3w · October 21, 2018, 12:36am

Hi @tom,
Thanks for your reply. And yes, the non-differentiability of sqrt at 0 causes the problem. In pytorch 0.4.0, I didn’t see the problem. Maybe there’s some changes in STFT calculation in torch from 0.4.0 to 0.4.1