Training an isolated part of a module

One is using scalar input for Embedding layer, [batch_size], while the other is already in the tensor form [batch_size, vocab_size]. So I need to do something like torch.matmul(output, embedding.weight) instead of embedding(output).

As for the issue with forward, I was referring to this post Any different between model(input) and model.forward(input).