Can we train a model with only a BiLSTM layer?

test_case · July 18, 2022, 6:34pm

Hi

I am experimenting with a network being a part of another large network to find out whether the experimental model calculates gradient. The experimental setup carries only a BiLSTM layer(the model is not carrying any linear layer purposefully) having an input of size torch.Size([64, 256]). The model structure is following:

class Experimental_(nn.Module):
    
    def __init__(self):
        super().__init__()
        self.lstm = nn.LSTM(256, 128,2,batch_first=True,bidirectional=True)
      
    def forward(self, input):
          lstm_output, (h,c) = self.lstm(input) 
          return lstm_output.view(-1,128*2)

The training method is designed as following:

def train():
    model.train()
    model.zero_grad()
    output_= model(input)       
    loss = lossFunction(output_,train_y)        
    loss.backward()
    # #aim is to check the gradient values of bilstm layers
    #for name, param in model.named_parameters():
    #      print(name, param.grad)
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    optimizer.step()

I use the following driver code:

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = Experimental_()
optimizer = AdamW(model.parameters(), lr=2e-5)
lossFunction = nn.NLLLoss()
epochs = 1
current = 1
while current <= epochs:
    train()
    current = current + 1

But after executing the module, I received the following error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Could you please help me understanding the missing links in these lines of codes?

soulitzer · July 19, 2022, 12:13am

Which version of torch are you using?
The following runs without error for me locally (built from source recently) and colab :

import torch

class Experimental_(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lstm = torch.nn.LSTM(256, 128,2,batch_first=True,bidirectional=True)

    def forward(self, input):
          lstm_output, (h,c) = self.lstm(input)
          return lstm_output.view(-1,128*2)

def train():
    model.train()
    model.zero_grad()
    output_= model(inp)
    loss = lossFunction(output_,train_y)
    loss.backward()
    # #aim is to check the gradient values of bilstm layers
    #for name, param in model.named_parameters():
    #      print(name, param.grad)
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    optimizer.step()

inp = torch.randn(10, 256)
train_y = torch.randint(0, 10, (10,))
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = Experimental_()
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)
lossFunction = torch.nn.NLLLoss()
epochs = 1
current = 1
while current <= epochs:
    train()
    current = current + 1