nn.LSTM not working together with functional_call for calculating the gradient

cijerezg · December 8, 2022, 6:04am

I have the following model:

class Encoder(nn.Module):                                                                                                                                 
      def __init__(self, action_dim, z_dim, skill_length):                                                                                                  
          super().__init__()                                                                                                                                
                                                                                                                                             
          self.lstm = nn.LSTM(action_dim, z_dim, skill_length, batch_first=True)                                                                                                                                                                                                                                                                                                                             
          self.log_std = nn.Parameter(torch.Tensor(z_dim))                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                    
      def forward(self, skill):                                                                                                                             
          mean, _ = self.lstm(skill)                                                                                                                                                                                                                                                    
          mean = mean[:, -1, :]                                                                                                                             
          std = torch.exp(torch.clamp(self.log_std, min=math.log(epsilon)))                                                                                 
          density = Normal(mean, std)                                                                                                                       
          sample = density.rsample()                                                                                                                        
                                                                                                                                                                                                 
          return sample, density

Then, I have the following code to extract and initialize parameters:

def pars(model):
    params = {}
    for name, param in model.named_parameters():
        if 'std' in name:
            init = torch.nn.init.constant_(param, 0)
        else:
            init = torch.nn.init.orthogonal_(param)
        params[name] = nn.Parameter(init)
    return params

Then, I’d like to do this:

model = Encoder(6, 2, 10)
x = torch.rand(25, 10, 6)
params = pars(model)

samp, d = functional_call(model, params, x)

grad = autograd.grad(torch.mean(samp), params.values(), retain_graph=True, allow_unused=True)

There should be derivative depending on the lstm layer as well as the std layer, but in my case I get None for the lstm layer.

When I run

samp, d = model(x)
grad = autograd.grad(torch.mean(samp), model.parameters(), retain_graph=True)

then I get the correct gradient, i.e., it depends on the lstm layer as well as the std parameters, instead of getting None.

I know the problem is with the LSTM layer because when I use a linear layer with nn.Linear, then the gradient depends on std as well as the linear layer. Unfortunately, I do not know to resolve this problem. I’d appreciate any help.