Different backward results between GPU and CPU related to inputs

import torch
from torch.autograd import Variable
import torchvision

# cpu
input = Variable(torch.rand(1, 3, 224, 224), requires_grad = True);
target = Variable(torch.LongTensor([12]));
net = torchvision.models.resnet18(pretrained = True);
predict = net(input);
criterion = torch.nn.CrossEntropyLoss();
loss = criterion(predict, target);
loss.backward();
print(input.grad);

# gpu
input = Variable(torch.rand(1, 3, 224, 224), requires_grad = True).cuda();
target = Variable(torch.LongTensor([12])).cuda();
net = torchvision.models.resnet18(pretrained = True).cuda();
predict = net(input);
criterion = torch.nn.CrossEntropyLoss();
loss = criterion(predict, target);
loss.backward();
print(input.grad);

I run upper code on my pytorch(v0.2), but get different results.
If I use CPU to calculate gradients, then input data can calculate correct gradients. If I use GPU to calculate gradients, then the Variable(input).grad is none. Results are as below:

Variable containing:
( 0 , 0 ,.,.) = 
 -2.4594e-03 -9.2436e-04 -2.2661e-03  ...  -2.1439e-04  7.1777e-04 -1.0200e-03
 -2.8949e-03  2.3239e-03  5.5778e-03  ...   5.9012e-04  2.5235e-03 -1.2698e-03
 -7.9948e-03 -6.1932e-03  6.5822e-03  ...  -2.3984e-03  1.3701e-03 -1.5826e-03
                 ...                                      ...                
 -2.6535e-03  5.0866e-03  9.6745e-03  ...   1.0311e-02  3.6533e-03  2.8187e-03
  2.3752e-04 -3.3087e-04 -5.7010e-04  ...  -6.2533e-05  4.6445e-04  1.5096e-03
 -2.4068e-04 -2.4299e-04 -2.5069e-03  ...   2.5801e-03  2.3278e-03 -5.6533e-04

( 0 , 1 ,.,.) = 
 -4.9766e-03 -5.9452e-03 -7.1767e-03  ...   2.9989e-03  2.3204e-03 -1.2527e-05
 -4.6509e-03 -1.2889e-03  3.8101e-03  ...   5.5863e-03  4.9677e-03  4.5645e-04
 -1.2904e-02 -1.2313e-02  4.5021e-03  ...   4.9497e-03  4.0524e-03  1.1087e-03
                 ...                                      ...                
  6.4054e-03  1.2937e-02  1.7449e-02  ...   1.0382e-02  6.1005e-04 -4.8517e-04
  5.3175e-03  4.4367e-03  4.5832e-03  ...  -3.8679e-03 -5.8353e-03 -3.5039e-03
  4.8981e-03  5.2441e-03  2.7703e-03  ...  -7.3002e-05 -1.7121e-03 -3.5452e-03

( 0 , 2 ,.,.) = 
 -3.6671e-03 -3.8698e-03 -5.9004e-03  ...   1.5832e-03  9.6964e-04  4.3112e-04
 -3.9912e-03 -2.6996e-03 -2.1368e-03  ...   2.4672e-03  2.9330e-03  9.5022e-04
 -8.2490e-03 -1.0680e-02 -1.7551e-03  ...   2.8387e-03  3.4498e-03  2.0305e-03
                 ...                                      ...                
 -3.3654e-03 -2.4266e-04  3.1803e-03  ...   2.7013e-03 -2.0120e-04  2.9582e-04
 -2.9952e-04 -5.4405e-04 -2.9528e-04  ...  -4.5974e-03 -3.2213e-03 -1.6316e-03
  8.2849e-04  1.1150e-03 -3.4638e-04  ...   1.1193e-04 -4.8881e-05 -1.3465e-03
[torch.FloatTensor of size 1x3x224x224]

None

The most difficult thing to understand is in my gpu version code, the gradient seems never backward to input data.

When you do:

input = Variable(torch.rand(1, 3, 224, 224), requires_grad = True).cuda();

You first create a Variable for which gradients should be computed, then create another Variable on the gpu with the same content and store it in input.
This means that the Variable stored in input won’t have gradients computed.
Use:

input = Variable(torch.rand(1, 3, 224, 224).cuda(), requires_grad = True);

or

input = Variable(torch.rand(1, 3, 224, 224), requires_grad = True);
cuda_input = input.cuda()
# Use cuda_input for your net
# Use input to check the gradients
1 Like

Thanks a lot! It really helps