What is happening in this forward function?

beneyal · December 4, 2018, 11:46am

Hi,

I’m trying to understand the language model example here: https://github.com/pytorch/examples/tree/master/word_language_model

I can’t understand what’s happening at the last two lines of the forward method in this model (I removed irrelevant code):

class RNNModel(nn.Module):
    """Container module with an encoder, a recurrent module, and a decoder."""

    def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5):
        super(RNNModel, self).__init__()
        self.drop = nn.Dropout(dropout)
        self.encoder = nn.Embedding(ntoken, ninp)
        self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout)
        self.decoder = nn.Linear(nhid, ntoken)

        self.init_weights()

        self.nhid = nhid
        self.nlayers = nlayers

    def init_weights(self):
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, input, hidden):
        emb = self.drop(self.encoder(input))
        output, hidden = self.rnn(emb, hidden)
        output = self.drop(output)
        decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2)))
        return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden

    def init_hidden(self, bsz):
        weight = next(self.parameters())
        return (weight.new_zeros(self.nlayers, bsz, self.nhid),
                weight.new_zeros(self.nlayers, bsz, self.nhid))

I’ll appreciate shedding some light on this. Thanks

albanD · December 4, 2018, 11:51am

The self.decoder here is just the linear layer to goes from the hidden units to the output size.
And to decode all the outputs for each time step and each element in that batch, it reshape the output tensor that was (where x is the separator for different dimensions) batch x nb_steps x hidden_size to batch*nb_steps x hidden_size.
The second view then reshapes again the decoded version from batch*nb_steps x ntoken to batch x nb_steps x ntoken.

Note that with recent version of pytorch, this is not necessary as Linear will accept more than 2D tensors and will consider all dimensions but the last one as being batch dimensions.

beneyal · December 4, 2018, 11:52am

So that’s why I didn’t get any errors when not doing all this reshaping?

albanD · December 4, 2018, 11:53am

Yes, this was needed in older versions only !

beneyal · December 4, 2018, 11:55am

Then I’m still at a loss of why my model isn’t working

If you have the time, it’d be wonderful if you can take a look at my code and maybe see what I’m doing wrong… I gave up on trying to figure out what I’m doing wrong, help :(

Thanks again for your quick answer!

albanD · December 4, 2018, 11:57am

I saw the post, but I’m afraid I have no experience whatsoever in RNN training or implementation

beneyal · December 4, 2018, 11:57am

Thank you for looking!