I have a seq2seq task and a model starting with an Embedding layer to process the tokenized input. I noticed that the performance using nn.Embedding is not great, but when using nn.Linear it is.
Shouldn’t this:
self.embedding_layer = nn.Embedding(self.vocab_size, 512)
embedding_out = self.embedding_layer(input)
Be the same as this?:
self.embedding_layer = nn.Sequential(
nn.Linear(vocab_size, 256),
nn.Linear(256, 512)
)
embedding_out = self.embedding_layer(F.one_hot(input))
Yet, the latter produces much better results. Does anyone have any insights on why this is? I would much prefer to use the Embedding layer as it is more readable, but I do not seem to figure out why it does not perform well.