[solved] Calculating wrong gradients

I am trying to calculate the gradient of J(f(x)/t, y) with respect to the input x, where J is the cross-entropy loss function and f(x) is the neural network output, y is the label and t is the scaling factor. So I am using the following code to implement it,

inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
outputs = net(inputs)
outputs.data = outputs.data / t

labels = Variable(labels).cuda(CUDA_DEVICE))
loss = criterion(outputs, labels)

But it seems the gradient with respect to the input is wrong. When t is large, then inputs.grad.data.norm(1) should be very small, but it is not.

Don’t operate on the data; any change on the data will not be included in the graph.
Instead, if you want to do in-place division, you can do outputs.div_(t). However, as suggested in the reference, just avoid any in-place operations.


Done by just doing

output = output / t :smile: