I might be missing something but I’m confused about the init_hidden method here,
In the first line it seems to be grabbing the first item produced by the generator self.parameters() and returning a Variable filled with zeros based on the data shape. My question is, how do we know what set of parameters will be returned by the line,
weight = next(self.parameters()).data
some testing on my local machine shows that when I instanciate a model and call next(model.parameters()) I get the embedding layer data (i.e. a Tensor of shape (num_tokens, embedding_size)). I don’t understand what the line,
good catch! It does indeed not have anything to do with the embedding! It is a trick.
What it does it grabs any parameter of the model and uses it to instantiate (through .data.new) a new tensor on the same device (i.e. cpu if the model/its parameters are on cpu, the same gpu as the parameter if the model has been transferred with model.cuda()).
There are sometimes discussions how to do this (eg the issue below), but I must admit I’m not actually aware of a recommended way to write this efficiently.