Training VGG16 model with large batch size

zhuyi490 · June 1, 2017, 10:28pm

Hi there,

I am trying to play with the ImageNet example as here. I didn’t change anything of the code. I am using the VGG16 model with a single GPU (1080ti). But the code consistently report out of memory error. Until I reduce the batch size to 50, I can train and test the code. I was wondering is this normal?

BTW, I use cudnn.benchmark=True. My OS is ubuntu 16.04.

Thanks

irustandi · June 2, 2017, 8:51pm

Yes, this is normal. The VGG models have lots of parameters. You can get a rough idea of the size by creating the pre-trained model and see how big the parameter file is, relative to the parameter files of other models.

zhuyi490 · June 2, 2017, 9:09pm

Thanks a lot for your reply. So if I want to train with batch size 256, I need to run it on 4 or more GPUs, right? Thank you.

irustandi · June 2, 2017, 9:37pm

Not sure how many GPUs you’ll need if you want batch size of 256. Play around with the batch size and check your GPU memory consumption using “nvidia-smi”.

raaj043 · June 12, 2017, 11:16am

Hello,

Pytorch vgg model was trained on which dataset? Imagenet?

zhuyi490 · June 12, 2017, 5:05pm

Yes, ImageNet.

I also try to fine tune the model for other datasets, like Places dataset, UCF101 dataset etc. But the batch size has to be smaller than 55 for the model to run.

raaj043 · June 12, 2017, 5:23pm

Thank you very much. Its helpful