Getting nan gradient with custom loss function

kgoyal40 · July 28, 2020, 1:19pm

Hi,

I am creating a custom cross entropy function and the aim to is get the gradients for some model parameters. I get ‘nan’ grad for the parameters. What is incorrect here? Following is the code I am using.

weigths = an input vector of model parameters
X = A dataset with # features = len(weights) - 1
sumproduct = a function sumproduct(X, W) = w0 + w1x1 + ... wnxn
weight_tensors = torch.tensor(weights, requires_grad=True)
y_hat = [sumproduct(list(X.iloc[i, :]), weight_tensors) for i in range(X.shape[0])]
prob = [1 / (1 + torch.exp(-1 * y_hat[i])) for i in range(len(y_hat))]

loss = -sum([torch.log(prob[i] + torch.exp(torch.tensor([-10], dtype=torch.float32)))
                     if y[i] == 1 else torch.log(1 - prob[i] + torch.exp(torch.tensor([-10], 
                     dtype=torch.float32)))
                     for i in range(len(y))]) / len(y)

loss.register_hook(print)        
loss.backward()
gradients = weight_tensors.grad
print(gradients)

Thanks

ptrblck · July 30, 2020, 8:01am

My first guess would be a negative value in torch.log is creating the NaNs.
Could you check your code, if this might be the case?