High memory cost when training VGG

I did a comparison of memory cost between Caffe and pyTorch, on single GeForce GTX 1070.
I was using Imagenet classification task for this testing. PyTorch codes in this test are just example codes.

  1. Alexnet

batch size      256    512     768     1024
caffe test      2443   4399   6365    OoM
caffe train     4229   7425   OoM    OoM
pytorch test  1855   2817   3167    3655
pytorch train 4803   4919   7041    OoM   

This result is very good for pytorch. It is roughly using 2/3 of memory of that used by Caffe. But when I did the same test using VGG, the memory cost of pytorch is higher then Caffe when training:

  1. VGG16

batch size      32      42      48        64
caffe test      2931   3607   3983    5025
caffe train     6015   7007   7995    OoM
pytorch test  2173   2649   2795    3655
pytorch train 6447   6907   OoM    OoM   

note: OoM is stand for ‘out of memory’.

Dose anyone has any idea on this?

Another observation is that, before training starts, the memory cost (shown by nvidia-smi) would first burst to a high level (approximately 20% higher then later value), and then stable at a lower value. This is likely to cause a ‘CUDNN_STATUS_ALLOC_FAILED’ error. Is this a normal situation?

1 Like

can i see your benchmark script?

for batch size 42, the difference in memory usage is minor, 6907 (pytorch) vs 7007 (Caffe). I expect that for batch size 48 it tips in either direction someway (pytorch might be picking a different algorithm for cudnn probably.

Yes, you’re right!
In the above test, I set cudnn.benchmark to True.
I just tried setting it to False. The memory cost drops.
Now, using VGG16, the maximum batch size increases to 60.
batch size | 48 | 52 | 56 | 60
pytorch train | 7657 | 8115 | 7409 | 7659
Thank you! @smth

1 Like

@Soumith_Chintala I stumbled upon this thread and I observed that I just observed the opposite of this. The memory footprint for vgg16 pytorch model in my cases increases if I set cudnn.benchmark=False. For batchsize of 64, vgg16 pytorch version occupies 9978 MB of GPU memory. Is it normal or is there a way to reduce this ?

My understanding is due to pytorch automatically select the cudnn algorithm, when there is adequate memory, it just use more. So it may not be fair to compare memory cost without considering the cudnn alogrithm.
In my comparison above, you can see batchsize=52 cost more memory than batchsize=56, because my memory limit is 8G. So I think one reasonable comparison metric could be ‘the maximum batch size given xx memory size’.