Clarifying init_hidden method in word_language_model example

I might be missing something but I’m confused about the init_hidden method here,

In the first line it seems to be grabbing the first item produced by the generator self.parameters() and returning a Variable filled with zeros based on the data shape. My question is, how do we know what set of parameters will be returned by the line,

weight = next(self.parameters()).data

some testing on my local machine shows that when I instanciate a model and call next(model.parameters()) I get the embedding layer data (i.e. a Tensor of shape (num_tokens, embedding_size)). I don’t understand what the line,

Variable(weight.new(self.nlayers, bsz, self.nhid).zero_())

has to do with the embedding layer? Are we just trying to get data that is the same type as weight?

1 Like

Hi Gabriel,

good catch! It does indeed not have anything to do with the embedding! It is a trick.
What it does it grabs any parameter of the model and uses it to instantiate (through .data.new) a new tensor on the same device (i.e. cpu if the model/its parameters are on cpu, the same gpu as the parameter if the model has been transferred with model.cuda()).

There are sometimes discussions how to do this (eg the issue below), but I must admit I’m not actually aware of a recommended way to write this efficiently.

Best regards

Thomas

4 Likes

I get it now. Thanks for the explanation and the link! Very helpful.

Does this trick still apply with 1.0.0 or is there a better way to do it today?

2 Likes