Whether Variable requires_grad

Tepp · March 6, 2017, 4:27am

Here inputs requires_grad default is True, and labels must be False, and my question is whether tmp_conv and h_init requires_grad True in the forward. Many thx

 import torch.nn as nn
 import torch.nn.functional as F

 class Net(nn.Module):
     def __init__(self):
         super(Net, self).__init__()
         #alexnet
         self.conv1 = nn.Conv2d(3, 20, 5, stride=1)
         self.conv1_bn = nn.BatchNorm2d(20)
         #for initial
         self.fc_h2l = nn.Linear(hidden_dim, out_dim)

     def forward(self, inputs):
         #alexnet
         inputs = F.max_pool2d(F.relu(self.conv1_bn(self.conv1(inputs))), (3, 3), stride = 2)
         #variable to store inputs
         tmp_conv = Variable(torch.zeros(2,batch_size,inputs.size()[1],inputs.size()[2],inputs.size()[3]))
         tmp_conv[0,:,:,:,:] = inputs[:,:,:,:].clone()
         ......
         #for initial
         h_init = Variable(torch.randn(batch_size,hidden_dim))
         step_init = F.sigmoid(self.fc_h2l(h_init))
         .....
 alexnet = Net()
 alexnet.cuda()
 #####train
  inputs= Variable(inpt.cuda())
  labels =  Variable(labels.cuda(), requires_grad=False)

apaszke · March 6, 2017, 8:20am

If I understand correctly, you’re going to save some intermediate values into tmp_conv right? In that case the Variable shouldn’t require grad, because you’re going to overwrite the original content anyway. But I think it would be much cleaner and simpler to use torch.cat or torch.stack.

h_init also doesn’t need to require grad, because you won’t even be optimizing it.

Tepp · March 6, 2017, 10:23am

Yep, thans, that is exactly what I want. However, in my classification task, tmp_conv and step_init are incorporated to form the final feature representation as below. tmp_conv and h_init should requires_grad True or False?? I am newbie here, hope i don’t bother you.

criterion = nn.CrossEntropyLoss()
fea = func(step_init,tmp_conv)
loss = criterion(fea, labels)

Here func is some sort of function

apaszke · March 6, 2017, 2:04pm

If you don’t optimize them, then leave requires_grad set to False (the default). Set it to True only if these are Variables of which you want to get the gradient (independently of any other Variable). It doesn’t seem to be the case here, so just leave it as is.