I am training a simple LSTM model however pytorch gives me an error saying that I need to set retain_graph=True. However this takes the model longer to train and I do not think I need to do this.
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
I do not want to use retain_graph=True because the training takes longer to run. I do not think that my simple LSTM should need the retain_graph=True. What am I doing wrong?
The problem is that the hidden layers in your model are shared from one invocation to the next. And so they are all linked.
In particular, because the LSTM module runs the whole forward, you do not need to save the final hidden states:
In the most basic for of lstm, IIRC, the hidden layer value for the first iteration should be full of 0.
Since the module you use performs all the iterations in one call. You only ever want to provide a Tensor full of 0s for it.
If you want to train the initial hidden layer value, you can declare them as nn.Parameter and given them as output to the lstm (but still ignore the returned value).
But after the first iteration/step of training should the first hidden layer still be all zeroes. I’m guessing not since when we’re training the first hidden layer weight values change. In that case:
On my first training step:
my h0 has changed from all zeroes to some non-zero weights - let’s say h0*.
On my second training step:
how do I specify that I want the updated h0*
i.e. lstm_out, _ = self.lstm(x.view(-1, 1, 3),h0*)
without sharing the hidden layers from one invocation the next.
I’m sure I’m confusing something here - but I just wanted to double check.
Not necessarily. Keeping them as 0 for the first step all the time is valid as well.
And reduces the number of parameters so might help avoid overfitting.