Is it normal generating 'PAD' using seq2seq model?

In the encoder part, I have used packed sequence like:

packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)

In the decoder part, during training, I generate output one by one up to MAX_LENGTH, and then use maskloss like this:

crossEntropy = -torch.log(torch.gather(input, 1, target.view(-1, 1)))
loss = crossEntropy.masked_select(mask).mean()

When sampling, an input sentence is given and then generate output sentence using beam search.
Why I still generate ‘PAD’ before EOS? ps: I pad the training data after the EOS token.
Is it normal???

Can anybody help me?? or somebody can give me the right and clear pipeline about text generation using encoder-decoder framework…