torch.nn.DataParallel(net) and cuda runtime error (2)

Hengck · November 2, 2017, 5:05am

I have a model net that works fine with batch=256 on single gpu. Then a use net=torch.nn.DataParallel(net, device_ids=[0, 1, 2, 3]) and increase my batch =4*256.

It gives be cuda runtime error (2). Is there any thing that i have done wrong?

it also won’t work it i try net=torch.nn.DataParallel(net, device_ids=[0, 1, 2, 3]) and batch =2*256.

richard · November 2, 2017, 2:50pm

Could you provide some code for some more context behind what you’re doing? From googling it seems like cuda runtime error (2) is associated with out of memory errors – perhaps your batch size is too large? What happens when you try to DataParallel with a smaller batch size?

rteja1113 · December 4, 2017, 12:27am

Hi Heng, I’m having same problems, were you able to figure out ?

Thank you

SherlockLiao · December 4, 2017, 2:56am

if you reduce the batch, can it work?