[RESOLVED] RuntimeError: cuda runtime error (2)


When I try to train a pre-trained (on Imagenet) Resnet152 model, I get the following error:

RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

I am using a K80 on an AWS EC2 p2.xlarge instance, so I am confused, since there is 12gb of VRAM available.
Any thoughts on why this is ocurrring?


It’s possible that the batch size is too large for a resnet152?

1 Like

So, it threw this error at the end of training the first epoch, when it was just about to begin testing for accuracy.
The batch_size was 20 for both the trainloader and testloader, so I am confused, since it is already fairly low. Should I reduce it further to, say, 10?



trainloader_aug = data_utils.DataLoader(train_dataset, batch_size=20, shuffle=True)
testloader = data_utils.DataLoader(test_dataset, batch_size=20, shuffle=True)
1 Like

Yep, that solved it, the batch_size was too large apparently. Thanks for your help!

Hi, what the batch_size do you set? I am using p2.xlarge and resnet152, but it keeps saying “runtime error” even I have set it to 10.

I am doing two-class classification with resnet34 and suddenly the same error of “out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58” is occuring. previously code was running with any batch size. I reduced the batch size to 2 but the error remains. Anyone knows how to rectify this error?

1 Like