To what extent is having the embedding parameters passed to the optimizer multiple times ok/ not ok? Any better way to handle this, in idiomatic pytorch? I want to avoid allocating the embedding multiple times, then deallocating the ones we dont need, ideally.
If I’m not missing something obvious, do you really need to explicitly pass the embedding.parameters() to the opt, since you already gave it encoder.parameters() which includes embedding.parameters()?
The code for nn.Module.__setattr__() suggests that it will remove duplicate parameters passed to it, so it seems to me that
You can still have two modules with dictionaries as parameters. Both modules will have once each of its parameters in the dict, but if the two dictionaries have common parameters, it will raise error
ValueError: some parameters appear in more than one parameter group