When I try to train a pre-trained (on Imagenet) Resnet152 model, I get the following error:
RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66
I am using a K80 on an AWS EC2 p2.xlarge instance, so I am confused, since there is 12gb of VRAM available.
Any thoughts on why this is ocurrring?
It’s possible that the batch size is too large for a resnet152?
So, it threw this error at the end of training the first epoch, when it was just about to begin testing for accuracy.
The batch_size was 20 for both the trainloader and testloader, so I am confused, since it is already fairly low. Should I reduce it further to, say, 10?
trainloader_aug = data_utils.DataLoader(train_dataset, batch_size=20, shuffle=True)
testloader = data_utils.DataLoader(test_dataset, batch_size=20, shuffle=True)
Yep, that solved it, the batch_size was too large apparently. Thanks for your help!
Hi, what the batch_size do you set? I am using p2.xlarge and resnet152, but it keeps saying “runtime error” even I have set it to 10.
I am doing two-class classification with resnet34 and suddenly the same error of “out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58” is occuring. previously code was running with any batch size. I reduced the batch size to 2 but the error remains. Anyone knows how to rectify this error?