What if you say
device_ids=[1, 2, 3,0]
in nn.DataParallel?
Ha! This solves the issue! So when I define
device = torch.device("cuda:1")
for
model.to(device)
(and the data tensors) it breaks when I leave the defaults in DataParallel
or have sth like device_ids=[0, 1, 2, 3]
in DataParallel
but if I then change it to device_ids=[1, 2, 3, 0]
it works. Just wondering if this is intended or a bug. @smth