I ran the following code in pytorch 0.3. It gave me different values of gradients (printed at the last line) for different values of “number”. However, it turns out that in version 0.4, the calculated gradient is stable and does not change with the values of “number”. (It should not in mathematics.) What happens between version 0.3 and 0.4 that make this change? Or anyone could explain why the gradient is unstable in version 0.3. I guess one of the reason is that the gradient is almost zero at the given data point which makes this unstableness but why it does not happen in version 0.4? If the gradient is unstable, then any analysis based on precision seems to be a disaster. Any help is appreciated.

number = 2

crafting_Z = np.array([[ -87.63061 , 9.214709 , 8.268724 , 1.587333 ,

19.902473 , -63.422 , -172.77057 , 323.4511 ,

-107.37048 , -46.83583 ],

[ 31.731297 , -10.696803 , 341.1024 , -57.52041 ,

-45.01423 , -162.55489 , 28.096256 , -11.275765 ,

26.65572 , -107.7822 ],

[ -62.576157 , 184.25623 , -28.676 , -67.575714 ,

32.881653 , 1.1464163, 8.9399185, -16.74968 ,

-27.357265 , -36.670937 ]])

targets = np.array([7,2,1])

crafting_Z = Variable(torch.from_numpy(crafting_Z), requires_grad=True)

targets = Variable(torch.from_numpy(targets), requires_grad=False)

temp = (crafting_Z - logsumexp(crafting_Z, 1).unsqueeze(1)).unsqueeze(2)

for i in range(number-1):

temp = torch.cat([temp, (crafting_Z - logsumexp(crafting_Z, 1).unsqueeze(1)).unsqueeze(2)], dim=2)

output = logsumexp(temp,2)

if crafting_Z.grad is not None:

crafting_Z.grad.data.zero_()

loss = F.nll_loss(output,targets)

loss.backward()

print(crafting_Z.grad.data)