when I run a forward pass for
class Test(nn.Module): def forward(self, x): y = torch.zeros(x.size(), 10, 20).cuda() # <------------- return y + x test = nn.DataParallel(Test()).cuda()
with 2 GPUs, on which GPU is y? Does .cuda() somehow know about on which GPU the current 1/2 minibatch is handled? Or will it be pushed to GPU0 by default and then transfered if x is on GPU1?
Edit: I guess it will pushed to GPU0 and for creating on the same GPU use new like here https://github.com/pytorch/pytorch/blob/master/docs/source/notes/cuda.rst