How to padding sequence of variable length in NLP task

in the task of NLP, such as neural machine translation, the source sentences have different length, if I want to put a batch in the RNN, they must have the same length.
I have read many examples but they only put one sample in the RNN at a time, I wonder if I padding zeros when use word embeddings before input the network, does it work? will these zeros change my final consequence or not?
I want to know how to do this job best

Try pad_sequence.

1 Like

As to consequences of padding with zeros, there is none. From a very high level view, think of it as writing a sentence with many white spaces before it. This does not alter the meaning in anyway, to the reader the sentence is the same with or without preceding white spaces, neural networks are able to learn this indifference.