Behaviour of tensor.cuda() in nn.DataParallel

Hello,
when I run a forward pass for

class Test(nn.Module):
    def forward(self, x):
        y = torch.zeros(x.size()[0], 10, 20).cuda() # <-------------
        
        return y + x
    
test = nn.DataParallel(Test()).cuda()

with 2 GPUs, on which GPU is y? Does .cuda() somehow know about on which GPU the current 1/2 minibatch is handled? Or will it be pushed to GPU0 by default and then transfered if x is on GPU1?

Edit: I guess it will pushed to GPU0 and for creating on the same GPU use new like here https://github.com/pytorch/pytorch/blob/master/docs/source/notes/cuda.rst