I am creating a custom cross entropy function and the aim to is get the gradients for some model parameters. I get ‘nan’ grad for the parameters. What is incorrect here? Following is the code I am using.
weigths = an input vector of model parameters X = A dataset with # features = len(weights) - 1 sumproduct = a function sumproduct(X, W) = w0 + w1x1 + ... wnxn weight_tensors = torch.tensor(weights, requires_grad=True) y_hat = [sumproduct(list(X.iloc[i, :]), weight_tensors) for i in range(X.shape)] prob = [1 / (1 + torch.exp(-1 * y_hat[i])) for i in range(len(y_hat))] loss = -sum([torch.log(prob[i] + torch.exp(torch.tensor([-10], dtype=torch.float32))) if y[i] == 1 else torch.log(1 - prob[i] + torch.exp(torch.tensor([-10], dtype=torch.float32))) for i in range(len(y))]) / len(y) loss.register_hook(print) loss.backward() gradients = weight_tensors.grad print(gradients)