[RESOLVED] RuntimeError: cuda runtime error (2)

nikmentenson · May 17, 2017, 3:29pm

Hi,

When I try to train a pre-trained (on Imagenet) Resnet152 model, I get the following error:

RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

I am using a K80 on an AWS EC2 p2.xlarge instance, so I am confused, since there is 12gb of VRAM available.
Any thoughts on why this is ocurrring?

Thanks

smth · May 17, 2017, 10:19pm

It’s possible that the batch size is too large for a resnet152?

nikmentenson · May 18, 2017, 4:43am

@smth
So, it threw this error at the end of training the first epoch, when it was just about to begin testing for accuracy.
The batch_size was 20 for both the trainloader and testloader, so I am confused, since it is already fairly low. Should I reduce it further to, say, 10?

Thanks

Settings:

trainloader_aug = data_utils.DataLoader(train_dataset, batch_size=20, shuffle=True)
testloader = data_utils.DataLoader(test_dataset, batch_size=20, shuffle=True)

nikmentenson · May 18, 2017, 5:10am

Yep, that solved it, the batch_size was too large apparently. Thanks for your help!

11165 · March 21, 2018, 3:01am

Hi, what the batch_size do you set? I am using p2.xlarge and resnet152, but it keeps saying “runtime error” even I have set it to 10.

Iram_Shahzadi · October 3, 2018, 1:24am

I am doing two-class classification with resnet34 and suddenly the same error of “out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58” is occuring. previously code was running with any batch size. I reduced the batch size to 2 but the error remains. Anyone knows how to rectify this error?
Thanks!