I wonder does the GPU memory usage rough has a linear relationship with the batch size used in training?
I was fine tune ResNet152. With a batch size 8, the total GPU memory used is around 4G and when the batch size is increased to 16 for training, the total GPU memory used is around 6G. The model itself takes about 2G. It seems to me the GPU memory consumption of training ResNet 152 is approximately 2G + 2G * batch_size / 8?