One is using scalar input for Embedding layer, [batch_size], while the other is already in the tensor form [batch_size, vocab_size]. So I need to do something like torch.matmul(output, embedding.weight)
instead of embedding(output)
.
As for the issue with forward
, I was referring to this post Any different between model(input) and model.forward(input).