GRU/RNN model unrolling

jackfoo · January 14, 2021, 11:09am

I’m training a CNN+GRU that needs to work on a large sequence length. The problem is that it cannot fit in the GPU memory with such sequence lengths due to the unrolling in time. Is there a way to prevent this from happening? There is an option in Keras unroll=False when creating a model, but I can’t find something like this in Pytorch.

miro · January 14, 2021, 4:11pm

You could try passing in half of your sequence, then detaching the output hidden state from the backprop graph, and then pass that output hidden state into the LSTM again along with the second half of the sequence.