Implement many to many LSTM NEED HELP

I need words to words model. However, it is really hard to understand.
I think Y_hat has to shape [batch_size, target_sequence, vocab_size] when target has [batch_size, sequence].

In an implementation, I can’t get above shape of output.
Please help me.

Code snippet of my model as follow:

 x = self.embed(input_x)
packed_x = pack_padded_sequence(x, seq, batch_first=True)

packed_h, (packed_h_t, packed_c_t) = self.rnn(packed_x, (h0, c0))
unpacked_x, _ = pad_packed_sequence(packed_h, batch_first=True)
_output = self.linear(unpacked_x)
output = F.log_softmax(_output, dim=1)