Hi,
I am encountering a for me very strange issue with the function self.named_parameters().
Long story short:
I am trying to create the following layer:
self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd)).to(self.device)
After creation I generate a param_dict while creating a optimizer with this function:
def get_param_dict(self):
return {pn: p for pn, p in self.named_parameters()}
The strange behavior is that because of the device movement to self.device the pos_emb layer no longer shows up within the named_parameters dictionary:
self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd)).to(self.device)
print(self.get_param_dict())
--> dict_keys([])
If I remove the .to(self.device)
part named_parameters behaves as expected:
self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
print(self.get_param_dict())
--> dict_keys(['pos_emb'])
Why is this the case and how to fix it?
Somehow it affects only this nn.Parameter layer, the other layers are listed correctly with device movement or without…
I need a working get_param_dict function for optimizer configuration.
I already tried to train my model without the movement to self.device for the pos_emb layer but then the training fails because the tensor is on the wrong device (obviously).
Thanks a lot for any kind of hint or solution!