PyTorch memory consumption

ptrblck · September 18, 2022, 9:12pm

You might be forgetting the intermediate activations, which need to be stored for the gradient computation. This post describes a similar use case for a ResNet.