Fluctuating Memory Consumption through training

ShaoMinLiu-Holmusk · February 20, 2022, 8:19am

Has anybody has the issue of spiking memory usage through training. I am fine-tuning a Sentence BERT model from https://github.com/UKPLab/sentence-transformers.

I have noticed that the memory use of the model will remain below 2Gb for a few epochs, but will suddenly break the training at some point and complains Out of Memory.

I have about 8Gb ram on my GPU, because of this issue, I have to reduce my batchsize and allow the training to use less than 2Gb of ram most of the time just to avoid OOM in case of a memory spike.

This is definitely under utilising resources, I was wondering if anybody faced similar situations before, what could be the reason here?