I wonder the inner mechnism when calling mymodule.to(device) for a typical module;
class MyModule(nn.Module):
def __init__()
super...
self.w = nn.Parameter(torch.ones(1))
When we call mymodule.cuda() the returned tensor w from nn.Parameters() will pointer to gpu tensor w. However, our optimizer can not optimize non-leaf tensor if gpu w is created by gpu_w = cpu_w.cuda(), so What does pytorch solve this problem?
Thank you.
It’s product of the way you define it.
gpu_w itself is not leaf as it’s a copy of the cpu tensor. The cpu version is leaf in fact.
You can bypass this by calling self.w = nn.Parameter(torch.ones(1).cuda())
Edit :
It may be better you to define that parameter on cpu and then to allocate on gpu if you use it during forward function. That makes much more sense. In fact, you can also wrap it inside another nn.Module in order to allocate it into gpu inside init.