Why is memory repeatedly allocated and freed during the backward pass?

With my limited insight in Pytorchs autograd functionality, I was in the belief, that a computational graph was dynamically created during the forward pass, and hence consumed a lot of memory, while this same memory was gradually freed during the backward pass, where the actual gradients are calculated.

However, when I monitor the memory usage during a forward and backward pass, I see that the forward pass consumes almost no memory, and that the backward pass repeatedly consumes and frees memory in a cyclical manner. I run on a CPU.

Could someone please explain to me why I experience this behaviour?

To save memory, the backward pass recompute some parts. The forward pass doesn’t store a lot so you can use big batch without fearing RAM/GPU-RAM issues. Then, during the backward pass you compute only what you need using the few values stored and you store only the needed gradients for backpropagation algorithm.

I hope it’s clear, don’t fear to ask more detailed/clear explanations :slight_smile:

Thanks for your reply!

So does the backward pass automatically apply checkpointing or is it some other kind of memory saving technique?