Thank you for detail reply.
I checked the gradient. Gradients is same, but training is very different.
I have no idea Is there any other possible cause?
N = 5
C = 3
input = Variable(torch.randn(N, C), requires_grad=True)
target = Variable(torch.zeros(N).random_(0, C).long())
loss = nn.NLLLoss()(F.log_softmax(input), target.view(N))
print(loss)
loss.backward()
print(input.grad)
Variable containing:
1.8412
[torch.FloatTensor of size 1]
Variable containing:
0.0994 -0.1139 0.0145
0.1437 0.0180 -0.1618
0.0343 -0.1474 0.1131
-0.1896 0.1606 0.0290
0.0788 -0.1821 0.1033
[torch.FloatTensor of size 5x3]
input.grad = input.grad * 0
loss = MultiClassFocalLoss(gamma=0)(input, target)
print(loss)
loss.backward()
print(input.grad)
Variable containing:
1.8412
[torch.FloatTensor of size 1]
Variable containing:
0.0994 -0.1139 0.0145
0.1437 0.0180 -0.1618
0.0343 -0.1474 0.1131
-0.1896 0.1606 0.0290
0.0788 -0.1821 0.1033
[torch.FloatTensor of size 5x3]