Grad is None even when I set requires_grad=True

Lusica · August 17, 2020, 3:32pm

Hi,
I got a problem when I want to get the grad for input features with a pre-trained model.

The source code is

        H_Features = output[0].clone().detach().requires_grad_(True)
        V_Features = output[0].clone().detach().requires_grad_(True)
        

        print("Features:")
        print(H_Features.is_leaf)
    
        print(V_Features.requires_grad)
        
        H_model = RouteNet()
        H_model.load_state_dict(torch.load("./net_H_epoch_39_batch_15_param.pkl"), strict=False)
        V_model = RouteNet()
        V_model.load_state_dict(torch.load("./net_V_epoch_39_batch_15_param.pkl"), strict=False)
       
        target = torch.zeros((1, 1, placedb.num_bins_x, placedb.num_bins_y))
        congestion_H = H_model(H_Features)
        congestion_V = V_model(V_Features)

        congestion_H.requires_grad = True
        congestion_V.requires_grad = True
        
        loss = torch.nn.MSELoss()
  
        H_loss = loss(congestion_H, target)
        V_loss = loss(congestion_V, target)
        H_loss.requires_grad = True
        V_loss.requires_grad = True
        
        print("congestion_H:")
        print(congestion_H.requires_grad)

        print("H_loss is_leaf:")
        print(H_loss.is_leaf)
        print(V_loss.requires_grad)

        print(H_loss)
        print(V_loss)
        H_Features.retain_grad()
        V_Features.retain_grad()
        H_loss.backward(torch.tensor(1.), retain_graph=True)
        V_loss.backward(torch.tensor(1.), retain_graph=True)

        print("H_Features.grad:")
        print(H_Features.grad_fn)
        print(V_Features.grad)
       
        H_grad = H_Features.grad
        V_grad = V_Features.grad

the result shows even I set all parameters, input features and loss ‘requires_grad=True’, after I use nn.MSELoss(), the loss’s requires_grad is False and the output of “H_loss.is_leaf” is True.

Then I run ‘H_loss.backward()’

Every parameters’ grad is None and the input features’ grad is also None.

I don’t know why it happens and how to solve it.

albanD · August 17, 2020, 3:53pm

Hi,

These two lines look very suspicious:

        congestion_H.requires_grad = True
        congestion_V.requires_grad = True

Why do you have to do that? If the input features already require gradients, these outputs should as well.

Lusica · August 17, 2020, 3:54pm

Yep, that’s weird. I have to set it by myself or it will be false.

albanD · August 17, 2020, 3:58pm

Then there is something in your model that breaks autograd.
You should check in there what it is (when does the requires_grad become False).
I can take a look if you can share the model here.

Lusica · August 17, 2020, 4:09pm

The model structure is shown as follows.

class RouteNet(torch.nn.Module):
    def __init__(self):
        super(RouteNet, self).__init__()
        self.encode = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, 9, 1, 4),
            torch.nn.MaxPool2d(2, 2, 0, 1),
            torch.nn.Conv2d(32, 64, 7, 1, 3),
            torch.nn.MaxPool2d(2, 2, 0, 1),
            torch.nn.Conv2d(64, 32, 9, 1, 4))
        self.decode = torch.nn.Sequential(
            torch.nn.Conv2d(32, 32, 7, 1, 3),
            torch.nn.ConvTranspose2d(32, 16, 9, 2, 4, 1),
            torch.nn.Conv2d(16, 16, 5, 1, 2),
            torch.nn.ConvTranspose2d(16, 4, 5, 2, 2, 1),
            torch.nn.Conv2d(4, 1, 3, 1, 1))
 
    def forward(self, x):
        encode_out = self.encode(x)
        # res = conv_out.view(conv_out.size(0), -1)
        out = self.decode(encode_out)
        return out

when I use

for name, param in H_model.state_dict().items():
            print(name)
            print("requires_grad: ", param.requires_grad)

it shows

encode.0.weight
requires_grad:  False
encode.0.bias
requires_grad:  False
encode.2.weight
requires_grad:  False
encode.2.bias
requires_grad:  False
encode.4.weight
requires_grad:  False
encode.4.bias
requires_grad:  False
decode.0.weight
requires_grad:  False
decode.0.bias
requires_grad:  False
decode.1.weight
requires_grad:  False
decode.1.bias
requires_grad:  False
decode.2.weight
requires_grad:  False
decode.2.bias
requires_grad:  False
decode.3.weight
requires_grad:  False
decode.3.bias
requires_grad:  False
decode.4.weight
requires_grad:  False
decode.4.bias
requires_grad:  False

However if I use

for p in H_model.named_parameters():
            print("requires_grad: ", p[1].requires_grad)

it shows

requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True
requires_grad:  True

I think maybe you are right but I don’t know what the requires_grad exactly are.

albanD · August 17, 2020, 4:18pm

Hi,

Yes this is expected. Getting the state_dict is not a differentiable operation. But that shouldn’t be a problem here as you’re not doing it during the forward.

In any case, since the input to your model requires_grad, then all the output will as well (even if the weights don’t).

But if you’re model is just the same as the RouteNet you shared above, then the output will require gradient if the input x does.

Lusica · August 17, 2020, 4:20pm

So…It becomes weird. The input features require gradients but the output of the model is False. Is there any bug in my code?

albanD · August 17, 2020, 4:24pm

There is definitely something else
Doing this returns True for me:

import torch

class RouteNet(torch.nn.Module):
    def __init__(self):
        super(RouteNet, self).__init__()
        self.encode = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, 9, 1, 4),
            torch.nn.MaxPool2d(2, 2, 0, 1),
            torch.nn.Conv2d(32, 64, 7, 1, 3),
            torch.nn.MaxPool2d(2, 2, 0, 1),
            torch.nn.Conv2d(64, 32, 9, 1, 4))
        self.decode = torch.nn.Sequential(
            torch.nn.Conv2d(32, 32, 7, 1, 3),
            torch.nn.ConvTranspose2d(32, 16, 9, 2, 4, 1),
            torch.nn.Conv2d(16, 16, 5, 1, 2),
            torch.nn.ConvTranspose2d(16, 4, 5, 2, 2, 1),
            torch.nn.Conv2d(4, 1, 3, 1, 1))
 
    def forward(self, x):
        encode_out = self.encode(x)
        # res = conv_out.view(conv_out.size(0), -1)
        out = self.decode(encode_out)
        return out

net = RouteNet()
inp = torch.rand(2, 3, 100, 100, requires_grad=True)

out = net(inp)
print(out.requires_grad)

Lusica · August 17, 2020, 4:43pm

It doesn’t work for me… It still shows False. There must be something wrong and I will check it further.

Thanks a lot for your help. Hope you safe and healthy.

albanD · August 17, 2020, 4:44pm

Ah!
Well, I would recommend updating your version of pytorch to the latest. And make sure you don’t have conflicting versions

Lusica · August 17, 2020, 11:29pm

I found the problem when I run the codes

import torch
from torch import nn
from torch.autograd import Function

class RouteNet(torch.nn.Module):
    def __init__(self):
        super(RouteNet, self).__init__()
        self.encode = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, 9, 1, 4),
            torch.nn.MaxPool2d(2, 2, 0, 1),
            torch.nn.Conv2d(32, 64, 7, 1, 3),
            torch.nn.MaxPool2d(2, 2, 0, 1),
            torch.nn.Conv2d(64, 32, 9, 1, 4))
        self.decode = torch.nn.Sequential(
            torch.nn.Conv2d(32, 32, 7, 1, 3),
            torch.nn.ConvTranspose2d(32, 16, 9, 2, 4, 1),
            torch.nn.Conv2d(16, 16, 5, 1, 2),
            torch.nn.ConvTranspose2d(16, 4, 5, 2, 2, 1),
            torch.nn.Conv2d(4, 1, 3, 1, 1))
 
    def forward(self, x):
        encode_out = self.encode(x)
        # res = conv_out.view(conv_out.size(0), -1)
        out = self.decode(encode_out)
        return out

class CongestionComputeFunction(Function):

    @staticmethod
    def forward(ctx):
        net = RouteNet()
        inp = torch.rand(2, 3, 100, 100, requires_grad=True)
        out = net(inp)
        print(out.requires_grad)    #False
    
    @staticmethod
    def backward(ctx):
        # ret = torch.rand(2, 3, 100, 100, requires_grad=True)
        return None


class CongestionCompute(nn.Module):
    def __init__(self):
        super(CongestionCompute, self).__init__()

    def forward(self):
        return CongestionComputeFunction.apply()

if __name__=='__main__':
    congestion = CongestionCompute()
    congestion.forward()

I would like to know whether you have a solution?

albanD · August 18, 2020, 2:12pm

Hi,

It is expected that requires_grad=False inside the custom Function. These are built explicitly so that you specify what the backward should be. So there is no reason to track the gradients during the forward.
You’re missing a return statement there btw.

Why are you using a custom Function here?

Lusica · August 19, 2020, 12:41am

Hi,

Yes, I found I cannot get the grad in the custom Function. I will try to load my model in nn.Module.

Thank you so much.

Best regards,
Siting