[solved] Calculating wrong gradients

Shiyu · May 15, 2017, 8:51pm

I am trying to calculate the gradient of J(f(x)/t, y) with respect to the input x, where J is the cross-entropy loss function and f(x) is the neural network output, y is the label and t is the scaling factor. So I am using the following code to implement it,

inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
outputs = net(inputs)
outputs.data = outputs.data / t

labels = Variable(labels).cuda(CUDA_DEVICE))
loss = criterion(outputs, labels)
loss.backward()

But it seems the gradient with respect to the input is wrong. When t is large, then inputs.grad.data.norm(1) should be very small, but it is not.

ruotianluo · May 15, 2017, 11:21pm

Don’t operate on the data; any change on the data will not be included in the graph.
Instead, if you want to do in-place division, you can do outputs.div_(t). However, as suggested in the reference, just avoid any in-place operations.

reference:
http://pytorch.org/docs/notes/autograd.html#in-place-operations-on-variables

Shiyu · June 3, 2017, 7:33pm

Done by just doing

output = output / t