I’m working on a RNN at the moment, however the retain_graph option is consuming all of my gpu memory eventually. And training seems to get slower every epoch.
However, when I don’t specify retain_graph=True I get the following error: “RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time”. And I notice this only happens when I keep the hidden value, if my hidden is always None, then it works just fine.
When computing the gradients with the backward call, pytorch automatically free the computation graph use to create all the variables, and only store the gradients on the parameters just to perform the update (intermediate values are deleted).
In your case what I guess is happening is that after computing the derivatives of your criterion you use the hidden state in order to compute the cost at time t+1, so when you call backward again on this cost pytorch does not now how to backtrace. In a RNN it is natural to compute the cost in this way as you have to keep track of the recurrence.
What you might do is free the memory when the epoch finishes. Maybe if you put the code of your main loop I can suggest you a modification.
Your code explotes because of loss_avg+=loss If you do not free the buffer (retain_graph=True, but you have to set it to True because you need it to compute the recurrence gradient), then all is stored in loss_avg. Take in account that loss, in your case, is not only the crossentropy or whatever, it is everything you use to compute it. If you want to keep track of the scalar value that represents your accumulate loss you can do loss_avg+=loss.data (though the use of .data is deprecated, for this cases, I still find it useful, clean and simple). This will only store the actual scalar value.
Anyway I think your code should look something like:
for e in range(epochs):
for idx,(x,t) in enumerate(data_loader):#x should be 3 dimensional (recurrence,samples,dimension) if your network is fully connected, else 4 dimensional (time_step,batch,rows,cols)
for t in range(Time_steps):
...
I thought the nn.LSTM module already took care of the recurrence step, since the documentation for nn.lstm say: output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t)from the last layer of the LSTM, for each t.
Okei, if you use the nn.LSTM() you have to call .backward() with retain_graph=True so pytorch can backpropagate through time and then call optimizer.step(). Your problem is then when accumulating the loss for printing (monitoring or whatever). Just do loss_avg+=loss.data because if not you will be storing all the computation graphs from all the epochs. As the graph has not been free during backward call you have to do it in this way to only keep track of the scalar value representing the cost.
Thanks for the help . So I don’t need to free the memory manually? Anyhow I’d like to know how to do so properly, if you could give me some reference, haha. Thanks!