I am trying to calculate the gradient of J(f(x)/t, y) with respect to the input x, where J is the cross-entropy loss function and f(x) is the neural network output, y is the label and t is the scaling factor. So I am using the following code to implement it,
inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)
outputs = net(inputs)
outputs.data = outputs.data / t
labels = Variable(labels).cuda(CUDA_DEVICE))
loss = criterion(outputs, labels)
loss.backward()
But it seems the gradient with respect to the input is wrong. When t is large, then inputs.grad.data.norm(1) should be very small, but it is not.