Training VGG16 model with large batch size

Hi there,

I am trying to play with the ImageNet example as here. I didn’t change anything of the code. I am using the VGG16 model with a single GPU (1080ti). But the code consistently report out of memory error. Until I reduce the batch size to 50, I can train and test the code. I was wondering is this normal?

BTW, I use cudnn.benchmark=True. My OS is ubuntu 16.04.

Thanks

Yes, this is normal. The VGG models have lots of parameters. You can get a rough idea of the size by creating the pre-trained model and see how big the parameter file is, relative to the parameter files of other models.

Thanks a lot for your reply. So if I want to train with batch size 256, I need to run it on 4 or more GPUs, right? Thank you.

Not sure how many GPUs you’ll need if you want batch size of 256. Play around with the batch size and check your GPU memory consumption using “nvidia-smi”.

Hello,

Pytorch vgg model was trained on which dataset? Imagenet?

Yes, ImageNet.

I also try to fine tune the model for other datasets, like Places dataset, UCF101 dataset etc. But the batch size has to be smaller than 55 for the model to run.

Thank you very much. Its helpful