I am trying to calculate the gradient of J(f(x)/t, y) with respect to the input x, where J is the cross-entropy loss function and f(x) is the neural network output, y is the label and t is the scaling factor. So I am using the following code to implement it,

inputs = Variable(images.cuda(CUDA_DEVICE), requires_grad = True)

outputs = net(inputs)

outputs.data = outputs.data / t

labels = Variable(labels).cuda(CUDA_DEVICE))

loss = criterion(outputs, labels)

loss.backward()

But it seems the gradient with respect to the input is wrong. When t is large, then inputs.grad.data.norm(1) should be very small, but it is not.