we can get the grad of output
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
>>> input.grad
the input.grad value is deemed equal to the softmax(input)'s value, and the other is equal to 1 - softmax(input),
but the result above is not.
ps: why the web has no email notification