I am training vision CNN model (MoViNetA2 to be specific this particular implementation). This model has 4.1M parameters which ends up into 16.5 MB of model size. For training I use Tensors size of (4, 3, 50, 224, 224) which consume around 122MB. However torch.cuda.memory_summary()
reports over 16GB GPU memory allocated between pred = model(x)
and loss.backward()
Can somebody clarify why pytorch uses that many GPU memory? Is there a way to decrease memory consumption?