Hi, I am trying to calculate the maximum memory usage during the backward pass. I understand the intermediate results can be freed during the backward() call, but I am wondering the intermediate feature maps and their gradients are freed right after the backward pass or as soon as they are not needed anymore?
For example, if I have a simple 3-layer cnn, have the intermediate result of the second layer and its gradient be freed when computing the gradient of the first layer?
Thanks!