I have a model net that works fine with batch=256 on single gpu. Then a use net=torch.nn.DataParallel(net, device_ids=[0, 1, 2, 3]) and increase my batch =4*256.
It gives be cuda runtime error (2). Is there any thing that i have done wrong?
it also won’t work it i try net=torch.nn.DataParallel(net, device_ids=[0, 1, 2, 3]) and batch =2*256.
Could you provide some code for some more context behind what you’re doing? From googling it seems like cuda runtime error (2) is associated with out of memory errors – perhaps your batch size is too large? What happens when you try to DataParallel with a smaller batch size?
Hi Heng, I’m having same problems, were you able to figure out ?
if you reduce the batch, can it work?