# Understanding autograd calculation of backprop

I am attempting to re-implement backpropagation on my own for didactic purposes, but am running into some issues. I am trying to work backwards from a simple network, starting with LogSoftmax + NLLLoss, but I am unable to match the calculated gradient of the input to the LogSoftmax layer as calculated by autograd.

``````import torch
import torch.nn

new_relu_feats = torch.Tensor([[1,0,3]])
logits = torch.nn.LogSoftmax(dim=1)(new_relu_feats)
label = torch.LongTensor()
loss = torch.nn.NLLLoss()(logits, label)
loss.backward()

sm = torch.nn.Softmax(dim=1)(new_relu_feats)
dloss_dlogits = torch.Tensor([[0,-1,0]])
dlogits_dnew_relu_feat = torch.Tensor([[[1-sm, -sm, -sm], [-sm, 1-sm, -sm], [-sm, -sm, 1-sm]]])

dloss_dlogits * dlogits_dnew_relu_feat
# This does not entirely match above, but the middle column does (corresponding to the correct class)
``````

Is this correct, but the matrix is eventually reshaped for efficiency by simply selecting the portions that are non-zero (aka corresponding to the correct class)?

Hi,

You can use triple backticks ``` before and after your code to have nicer formatting.

The combination of gradients if not an element wise product but a matrix matrix multiplications.
You can do `torch.bmm(dlogits_dnew_relu_feat, dloss_dlogits.unsqueeze(-1)).squeeze(-1)` to get what you want.
The unsqueeze/squeeze in the last dimension is just to have a dummy dimension of size 1 to make `bmm` happy Thank you so much! That did the trick and I can match these layers, along with a different example that includes a Relu before the softmax. I am still a little unclear why it is dlogits_dnew_relu_feat @ dloss_dlogits rather than the other way round (dloss_dlogits @ dlogits_dnew_relu_feat) from a conceptual point of view.

Ah, I see now. The derivative for LogSoftMax was transposed from how I was thinking about it conceptually:

``````dlogits_dnew_relu_feat = torch.Tensor([[[1-sm, -sm, -sm], [-sm, 1-sm, -sm], [-sm, -sm, 1-sm]]])
``````

should be

``````dlogits_dnew_relu_feat = torch.Tensor([[[1-sm, -sm, -sm], [-sm, 1-sm, -sm], [-sm, -sm, 1-sm]]])
``````

and now you can do dloss_dlogits @ dlogits_dnew_relu_feat with no need for squeeze/unsqueeze

1 Like