It looks like we often provide our own embedding, prior to LSTM, and then assign
input_size == hidden_size, for the LSTM, eg http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html :
self.embedding = nn.Embedding(input_size, hidden_size) self.gru = nn.GRU(hidden_size, hidden_size)
It seems like this is kind of ‘wasteful’, since it’s adding an additional
hidden_size x hidden_size matrix multiply at the input of the LSTM, which we dont need in fact?