Is autograd storing the activations somewhere?

aliutkus · September 11, 2021, 6:12pm

out of curiosity, I would like to know:

when I apply some module to some batch, does autograd actually store the activations of all the layers somewhere ?

It’s not really whether I would like to access them, but I would like to know

thanks, cheers

tom · September 11, 2021, 8:17pm

It depends: for each operation (there might be several in a layer), autograd stores the inputs and/or outputs needed to compute the backward.
On newer PyTorch, the Node objects (.gradfn /. gradfn.next_functions) have “non-official” ._saved… attributes where you can see what has been stored.

Best regards

Thomas

aliutkus · September 12, 2021, 8:12am

thanks
let’s say we have some classical CNN, with each layer being the application of some conv relu batchnorm thing.

In that case, there would be a tensor of activations that is saved at each layer for batchprop, or is some smarter thing that is done to save memory ?

These are all pretty stupid questions, but it turns out I can’t readily find information on that topic

tom · September 17, 2021, 8:31pm

Each op decides what it wants for the backward.
(There is a tools/autograd/derivatives.yaml in the source with the definitions.) No particular smartness is applied by default, but people use a technique called checkpointing to save memory (by redoing parts of the compitation).

aliutkus · September 18, 2021, 8:31pm

thanks for the answers