LSTM module containing embedding is superfluous?

hughperkins · August 13, 2017, 9:43am

It looks like we often provide our own embedding, prior to LSTM, and then assign input_size == hidden_size, for the LSTM, eg http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html :

    self.embedding = nn.Embedding(input_size, hidden_size)
    self.gru = nn.GRU(hidden_size, hidden_size)

It seems like this is kind of ‘wasteful’, since it’s adding an additional hidden_size x hidden_size matrix multiply at the input of the LSTM, which we dont need in fact?