Hello,
I’m trying to get d(loss)/d(input)
. I know I have 2 options.
First option:
loss.backward()
dlossdx = x.grad.data
Second option:
# criterion = nn.CrossEntropyLoss(reduce=False)
# loss = criterion(y_hat, labels)
# No need to call backward.
dlossdx = torch.autograd.grad(outputs = loss,
inputs = x,
grad_outputs = ? )
My question is: if I use cross-entropy loss, what should I pass as grad_outputs
in the second option?
Do I put d(CE)/d(y_hat)
? Since pytorch crossentropy contains softmax, this will require me to pre-calculate softmax derivative using Kronecker delta.
Or do I put d(CE)/d(CE)
which is torch.ones_like
?
A conceptual answer is fine.