Gradient flow stopped on a combined model

ruanheng · October 22, 2019, 10:04pm

Hi, I meet with a problem that the gradient cannot backpropagate on a combined network. I checked lots of answers but cannot find a relevant solution to this problem. I would appreciate it so much if we can solve this.

I wanted to calculate the gradient for input data in this code:
‘’’
for i, (input, target, impath) in tqdm(enumerate(data_loader)):
# print(‘input.shape:’, input.shape)
input = Variable(input.cuda(), requires_grad=True)
output = model(input)
loss = criterion(output, target.cuda())
loss = Variable(loss, requires_grad=True)
loss.backward()
print(‘input:’, input.grad.data)
‘’’
but I got errror:
‘’’
print(‘input:’, input.grad.data)
AttributeError: ‘NoneType’ object has no attribute ‘data’
‘’’
and my model is a combined model that I loaded the parameters from two pretrained models.
I checked the requires_grad state-dict of model weights, it is true, however, the gradient of the model weights is None.
Is it because I load the state-dict that caused the gradient block?

How can I deal with this problem?

The model structure is attached below:
‘’’
class resnet_model(nn.Module):
def init(self, opt):
super(resnet_model, self).init()

    resnet = models.resnet101()
    num_ftrs = resnet.fc.in_features
    resnet.fc = nn.Linear(num_ftrs, 1000)

    if opt.resnet_path != None:
        state_dict = torch.load(opt.resnet_path)
        resnet.load_state_dict(state_dict)
        print("resnet load state dict from {}".format(opt.resnet_path))

    self.model1 = torch.nn.Sequential()

    for chd in resnet.named_children():
        if chd[0] != 'fc':
            self.model1.add_module(chd[0], chd[1])

    self.model2 = torch.nn.Sequential()

    self.classifier = LINEAR_LOGSOFTMAX(input_dim=2048, nclass=200)
    if opt.pretrained != None:
        self.classifier_state_dict = torch.load('../checkpoint/{}_cls.pth'.format(opt.pretrained))
        print("classifier load state dict from ../checkpoint/{}_cls.pth".format(opt.pretrained))
    self.classifier.load_state_dict(self.classifier_state_dict)

    for chd in self.classifier.named_children():
        self.model2.add_module(chd[0], chd[1])

def forward(self, x):
    x = self.model1(x)

    x = x.view(-1, 2048)

    x = self.model2(x)
    return x

‘’’

albanD · October 22, 2019, 10:16pm

Hi,

Why do you have this line: loss = Variable(loss, requires_grad=True) ?
Variable should not be used anymore.
So the line above should be deleted and to mark a Tensor for which you want gradients, you can use:
input = input.cuda().requires_grad_().

ptrblck · October 23, 2019, 10:18am

Sorry for nitpicking: the input should be handled as suggested by @albanD, while the loss should not be rewrapped, as this will detach the computation graph.

ruanheng · October 23, 2019, 11:12am

Thanks! The problem is solved with the advice!

I should learn more about the pytorch sturcture.

Thanks again！