I see. But it should not be the case since both are moved to args.local_rank
. Anyways, I did what you suggested and also changed the test-batch-size
to 1024
. Here’s the outcome:
Rank 0 test on device cuda:0
Rank 1 test on device cuda:1
after data=data.to(device,), before output=model(data) in test function, batch_idx: 0 device: 1
after data=data.to(device,), before output=model(data) in test function, batch_idx: 0 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 1 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 2 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 3 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 4 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 5 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 6 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 7 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 8 device: 0
after data=data.to(device,), before output=model(data) in test function, batch_idx: 9 device: 0
Test set: Average loss: 0.0275, Accuracy: 9913/10000 (99.13%)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 1 (while checking arguments for cudnn_convolution)