Clarifying init_hidden method in word_language_model example

Gabriel_Altay · February 20, 2018, 2:52am

I might be missing something but I’m confused about the init_hidden method here,

pytorch/examples/blob/master/word_language_model/model.py#L52


    self.decoder.bias.data.fill_(0)
    self.decoder.weight.data.uniform_(-initrange, initrange)


def forward(self, input, hidden):
    emb = self.drop(self.encoder(input))
    output, hidden = self.rnn(emb, hidden)
    output = self.drop(output)
    decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))
    return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden


def init_hidden(self, bsz):
    weight = next(self.parameters()).data
    if self.rnn_type == 'LSTM':
        return (Variable(weight.new(self.nlayers, bsz, self.nhid).zero_()),
                Variable(weight.new(self.nlayers, bsz, self.nhid).zero_()))
    else:
        return Variable(weight.new(self.nlayers, bsz, self.nhid).zero_())

In the first line it seems to be grabbing the first item produced by the generator self.parameters() and returning a Variable filled with zeros based on the data shape. My question is, how do we know what set of parameters will be returned by the line,

weight = next(self.parameters()).data

some testing on my local machine shows that when I instanciate a model and call next(model.parameters()) I get the embedding layer data (i.e. a Tensor of shape (num_tokens, embedding_size)). I don’t understand what the line,

Variable(weight.new(self.nlayers, bsz, self.nhid).zero_())

has to do with the embedding layer? Are we just trying to get data that is the same type as weight?

tom · February 20, 2018, 6:36am

Hi Gabriel,

good catch! It does indeed not have anything to do with the embedding! It is a trick.
What it does it grabs any parameter of the model and uses it to instantiate (through .data.new) a new tensor on the same device (i.e. cpu if the model/its parameters are on cpu, the same gpu as the parameter if the model has been transferred with model.cuda()).

There are sometimes discussions how to do this (eg the issue below), but I must admit I’m not actually aware of a recommended way to write this efficiently.

Best regards

Thomas

Gabriel_Altay · February 21, 2018, 1:46am

I get it now. Thanks for the explanation and the link! Very helpful.

zippeurfou · February 7, 2019, 7:42pm

Does this trick still apply with 1.0.0 or is there a better way to do it today?