LSTM module containing embedding is superfluous?

It looks like we often provide our own embedding, prior to LSTM, and then assign input_size == hidden_size, for the LSTM, eg http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html :

    self.embedding = nn.Embedding(input_size, hidden_size)
    self.gru = nn.GRU(hidden_size, hidden_size)

It seems like this is kind of ‘wasteful’, since it’s adding an additional hidden_size x hidden_size matrix multiply at the input of the LSTM, which we dont need in fact?