I’m trying to train an instance segmentation model on GPU based on this pytorch tutorial https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html.
However, after a few epochs (3 with a batch_size of 1, 6 with a batch_size of 2), all of the available RAM is used and the training is stopped.
How can that be fixed ?
I have a GPU with 16Go memory, my PC has 32Go RAM and I try to train the model with ~5000 images.
It seems that the rise in RAM usage is happening during losses.backward() in the train_one_epoch function (available here).