The recurrent connection is exist only in LSTM not CNN.
I use pre-trained CNN, so that I don’t need to train CNN.
Only what I need is the gradient from CNN to train LSTM.
Because CNN requires a lot of memory, I want to share gradient memory for BPTT.
For example, the gradient from CNN is computed and saved for LSTM training at each time, but internal gradient memory (or buffer?) of CNN is shared for next step.
If I follow the conventional RNN training code (forward (t=1,2,3, … , T) and backward (t=T, …, 3,2,1)), I think that dynamic graph of pytorch will allocate memory of CNN separately over time.
How can I handle this problem?
I will appreciate if someone give me any answer.