Hi, we’re trying to train a recurrent model which takes a sequence of images (ie. a video) as input and gives an output at every frame. We would like the gradients of the hidden state to go all the way through the video but there is not nearly enough GPU memory to store all of the saved tensors. Is there a good way to put these saved tensors into CPU memory as needed? The only way I can see right now is using a custom autograd.Function, but I need something that works with built in modules as well.
You can easily move Tensors from cpu to gpu as these operations are differentiable in pytorch.
The problem though is that you won’t be able to move the buffers contained in the computational graph: there is no common interface on how to store these for each Function and so it will be special cases for each