Hi,
The thing is that .cuda()
is an out of place operation. And so the result is not a Parameter anymore, it’s just a Tensor. Since it’s not a Parameter, it is not included in the state_dict
.
You can use tensor = torch.randn(10, device="cuda")
to create the tensor directly on gpu and avoid such problems.