I have found the following issue when working with the nn.Parameter() class.
Inside a custom nn.Module class, I defined a tensor such as
self.trainable_tensor = nn.Parameter(torch.rand(1, 2, 3)).to(device).
I set the attribute as a nn.Parameter object in order for it to be included in the parameter list and therefore be trained. After many hours of debugging I found out that the to(device) thing actually removes it from this list, only when the device is CUDA. I don’t know why is that, whether it is a bug or what. I understand that It is not necessary to put the .to(device) thing, because if the attribute is already in the parameters list, when you do
model.to(device) this tensor will be already part of it and will be moved to the device type. I just supposed that putting that there would be superfluous, but actually, it completely messed my implementation.
Is this a bug? Or is there some concept I am missing? Why does .to(device(“cuda”)) remove the tensor from the parameter list?