Besides the input data and the model’s parameters and buffers the intermediate forward activations could use a lot of memory depending on the model architecture. I don’t know how you’ve measured the memory usage, but this post explains it in more detail for a ResNet.
1 Like