Hi,
I am creating a custom cross entropy function and the aim to is get the gradients for some model parameters. I get ‘nan’ grad for the parameters. What is incorrect here? Following is the code I am using.
weigths = an input vector of model parameters
X = A dataset with # features = len(weights) - 1
sumproduct = a function sumproduct(X, W) = w0 + w1x1 + ... wnxn
weight_tensors = torch.tensor(weights, requires_grad=True)
y_hat = [sumproduct(list(X.iloc[i, :]), weight_tensors) for i in range(X.shape[0])]
prob = [1 / (1 + torch.exp(-1 * y_hat[i])) for i in range(len(y_hat))]
loss = -sum([torch.log(prob[i] + torch.exp(torch.tensor([-10], dtype=torch.float32)))
if y[i] == 1 else torch.log(1 - prob[i] + torch.exp(torch.tensor([-10],
dtype=torch.float32)))
for i in range(len(y))]) / len(y)
loss.register_hook(print)
loss.backward()
gradients = weight_tensors.grad
print(gradients)
Thanks