Parameters get copied to wrong GPU

I’m working on a server with multiple GPUs (cuda:0 to 3). I’ve allocated all my model parameters to cuda:2. The default CUDA device has been set to cuda:2 as well. However, it seems that as soon as I make a copy of the parameters, something seems to be using up memory in cuda:0.

Am I doing something wrong? I’ve tried the following:

--------Example Code----------

torch.cuda.set_device(2)            # This line doesn't seem to make any difference.
model = MyModel().cuda(2)         # I've checked that all the parameters and named_parameters are in cuda:2

for param in self.semiconv_unet.parameters():    # a) For some reasons it takes up lots of memory in cuda:0 after running this line.

for name, param in model.named_parameters():   # b) This eats up memory in cuda:0 as well.
optimizer = torch.optim.Adam(model.parameters())
for g in optimizer.param_groups:                # c) So does this.
list(model.parameters())                        # d) So does this

with torch.cuda.device(2):                      # e) This still takes up some memory in cuda:0 ,
    for g in optimizer.param_groups:            #    but not so much. Around 200mb in my case.

Pytorch version: 0.4.1 post2