It depends: for each operation (there might be several in a layer), autograd stores the inputs and/or outputs needed to compute the backward.
On newer PyTorch, the Node objects (.gradfn /. gradfn.next_functions) have “non-official” ._saved… attributes where you can see what has been stored.
Each op decides what it wants for the backward.
(There is a tools/autograd/derivatives.yaml in the source with the definitions.) No particular smartness is applied by default, but people use a technique called checkpointing to save memory (by redoing parts of the compitation).