PyTorch memory consumption

I am training vision CNN model (MoViNetA2 to be specific this particular implementation). This model has 4.1M parameters which ends up into 16.5 MB of model size. For training I use Tensors size of (4, 3, 50, 224, 224) which consume around 122MB. However torch.cuda.memory_summary() reports over 16GB GPU memory allocated between pred = model(x) and loss.backward()

Can somebody clarify why pytorch uses that many GPU memory? Is there a way to decrease memory consumption?

You might be forgetting the intermediate activations, which need to be stored for the gradient computation. This post describes a similar use case for a ResNet.

Ok, that makes sense. Thank you very much.