Where are the activations and gradients stored?

I am planning to implement a network which does not store the activations/gradients for a couple of layers, and recomputes them on the fly for the backward pass. I am currently using convolutional layers from torch.nn. Please let me know how and where I should change the implementation so that gradients will not be stored for all the layers, and activations also will not be stored for all the layers.

I think you are looking for torch.utils.checkpoint

1 Like

Yes, this is very relevant, thank you!