out of curiosity, I would like to know:
when I apply some module to some batch, does autograd actually store the activations of all the layers somewhere ?
It’s not really whether I would like to access them, but I would like to know
It depends: for each operation (there might be several in a layer), autograd stores the inputs and/or outputs needed to compute the backward.
On newer PyTorch, the Node objects (.gradfn /. gradfn.next_functions) have “non-official” ._saved… attributes where you can see what has been stored.
let’s say we have some classical CNN, with each layer being the application of some conv relu batchnorm thing.
In that case, there would be a tensor of activations that is saved at each layer for batchprop, or is some smarter thing that is done to save memory ?
These are all pretty stupid questions, but it turns out I can’t readily find information on that topic
Each op decides what it wants for the backward.
(There is a tools/autograd/derivatives.yaml in the source with the definitions.) No particular smartness is applied by default, but people use a technique called checkpointing to save memory (by redoing parts of the compitation).