Grad_outputs in autograd.grad (CrossEntropyLoss)

I’m trying to get d(loss)/d(input). I know I have 2 options.

First option:

        dlossdx =

Second option:

        # criterion = nn.CrossEntropyLoss(reduce=False)
        # loss = criterion(y_hat, labels)     
        # No need to call backward. 
        dlossdx = torch.autograd.grad(outputs = loss,
                                      inputs = x,
                                      grad_outputs = ? )

My question is: if I use cross-entropy loss, what should I pass as grad_outputs in the second option?

Do I put d(CE)/d(y_hat)? Since pytorch crossentropy contains softmax, this will require me to pre-calculate softmax derivative using Kronecker delta.

Or do I put d(CE)/d(CE) which is torch.ones_like?

A conceptual answer is fine.

1 Like