I did a comparison of memory cost between Caffe and pyTorch, on single GeForce GTX 1070.
I was using Imagenet classification task for this testing. PyTorch codes in this test are just example codes.
- Alexnet
batch size 256 512 768 1024
-----------------------------------------------------
caffe test 2443 4399 6365 OoM
caffe train 4229 7425 OoM OoM
pytorch test 1855 2817 3167 3655
pytorch train 4803 4919 7041 OoM
This result is very good for pytorch. It is roughly using 2/3 of memory of that used by Caffe. But when I did the same test using VGG, the memory cost of pytorch is higher then Caffe when training:
- VGG16
batch size 32 42 48 64
-----------------------------------------------------
caffe test 2931 3607 3983 5025
caffe train 6015 7007 7995 OoM
pytorch test 2173 2649 2795 3655
pytorch train 6447 6907 OoM OoM
note: OoM is stand for ‘out of memory’.
Dose anyone has any idea on this?
Another observation is that, before training starts, the memory cost (shown by nvidia-smi) would first burst to a high level (approximately 20% higher then later value), and then stable at a lower value. This is likely to cause a ‘CUDNN_STATUS_ALLOC_FAILED’ error. Is this a normal situation?