To(device) removes tensor from model parameters

benoriol · January 23, 2021, 10:24pm

I have found the following issue when working with the nn.Parameter() class.

Inside a custom nn.Module class, I defined a tensor such as self.trainable_tensor = nn.Parameter(torch.rand(1, 2, 3)).to(device).

I set the attribute as a nn.Parameter object in order for it to be included in the parameter list and therefore be trained. After many hours of debugging I found out that the to(device) thing actually removes it from this list, only when the device is CUDA. I don’t know why is that, whether it is a bug or what. I understand that It is not necessary to put the .to(device) thing, because if the attribute is already in the parameters list, when you do model.to(device) this tensor will be already part of it and will be moved to the device type. I just supposed that putting that there would be superfluous, but actually, it completely messed my implementation.

Is this a bug? Or is there some concept I am missing? Why does .to(device(“cuda”)) remove the tensor from the parameter list?

Thanks,

Ben

InnovArul · January 23, 2021, 11:15pm

You have to move the tensor before creating the nn.Parameter.
i.e.,

self.trainable_tensor = nn.Parameter(torch.rand(1, 2, 3).to(device))

In your case, executing .to(device) after nn.Parameter creation has returned a tensor.
i.e., self.trainable_tensor was assigned with a tensor.

You can print the type of variable (type(self.trainable_tensor)) to make sure that you have created nn.Parameter correctly.

benoriol · January 24, 2021, 7:55am

Woah thanks, that makes sense!

However, the reason for my long debugging time is that when it was assigned the cpu device, this didn’t happen. Therefore, training with cpu yielded a completely different model that when training with CUDA. MIght that be a bug? I mean, the model should be (almost) the same regardless on what device you train.

Thanks!

Benet

InnovArul · January 24, 2021, 10:58am

.to() checks if the memory content is already in the given device and then, acts to move the memory only if the device is different. In case of cpu, the memory was already in the device that’s requested. So, .to(). did not act on the memory.

benoriol · January 24, 2021, 11:07am

Oh I see, thanks!

Ben