Hello,
I’m trying to get `d(loss)/d(input)`. I know I have 2 options.

First option:

``````        loss.backward()
``````

Second option:

``````
# criterion = nn.CrossEntropyLoss(reduce=False)
# loss = criterion(y_hat, labels)
# No need to call backward.
My question is: if I use cross-entropy loss, what should I pass as `grad_outputs` in the second option?
Do I put `d(CE)/d(y_hat)`? Since pytorch crossentropy contains softmax, this will require me to pre-calculate softmax derivative using Kronecker delta.
Or do I put `d(CE)/d(CE)` which is `torch.ones_like`?