Is it normal generating 'PAD' using seq2seq model?

In the encoder part, I have used packed sequence like:

packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)

In the decoder part, during training, I generate output one by one up to MAX_LENGTH, and then use maskloss like this:

crossEntropy = -torch.log(torch.gather(input, 1, target.view(-1, 1)))
loss = crossEntropy.masked_select(mask).mean()

When sampling, an input sentence is given and then generate output sentence using beam search.
Why I still generate ‘PAD’ before EOS? ps: I pad the training data after the EOS token.
Is it normal???
image

Can anybody help me?? or somebody can give me the right and clear pipeline about text generation using encoder-decoder framework…