I wonder the inner mechnism when calling
mymodule.to(device) for a typical module;
self.w = nn.Parameter(torch.ones(1))
When we call
mymodule.cuda() the returned tensor
nn.Parameters() will pointer to
gpu tensor w. However, our optimizer can not optimize
non-leaf tensor if gpu
w is created by
gpu_w = cpu_w.cuda(), so What does pytorch solve this problem?
But the way you defined w it’s a leaf parameter
gpu_w = cpu_w.cuda(), in this case,
gpu_w is not a leaf tensor.
It’s product of the way you define it.
gpu_w itself is not leaf as it’s a copy of the cpu tensor. The cpu version is leaf in fact.
You can bypass this by calling
self.w = nn.Parameter(torch.ones(1).cuda())
It may be better you to define that parameter on cpu and then to allocate on gpu if you use it during forward function. That makes much more sense. In fact, you can also wrap it inside another nn.Module in order to allocate it into gpu inside init.