Hi
I am experimenting with a network being a part of another large network to find out whether the experimental model calculates gradient. The experimental setup carries only a BiLSTM layer(the model is not carrying any linear layer purposefully) having an input of size torch.Size([64, 256])
. The model structure is following:
class Experimental_(nn.Module):
def __init__(self):
super().__init__()
self.lstm = nn.LSTM(256, 128,2,batch_first=True,bidirectional=True)
def forward(self, input):
lstm_output, (h,c) = self.lstm(input)
return lstm_output.view(-1,128*2)
The training method is designed as following:
def train():
model.train()
model.zero_grad()
output_= model(input)
loss = lossFunction(output_,train_y)
loss.backward()
# #aim is to check the gradient values of bilstm layers
#for name, param in model.named_parameters():
# print(name, param.grad)
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
I use the following driver code:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = Experimental_()
optimizer = AdamW(model.parameters(), lr=2e-5)
lossFunction = nn.NLLLoss()
epochs = 1
current = 1
while current <= epochs:
train()
current = current + 1
But after executing the module, I received the following error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
Could you please help me understanding the missing links in these lines of codes?