These should be a novice question but unfortunately I couldn’t find an exact answer after some search, and hope to find some help here.

I am trying to train a simple LSTM model (using nn.LSTM) for sentiment classification with pretrained GloVe embeddings trained on my corpus. I padded the training data with zeros to align the timesteps within each batchs during training, but it came to me that there were no mapping vector contained for the padded zeros in the pretrained embedding, and I did not set padding_idx=0. However the model was able to train. I am therefore wondering what is the behavior of the model taking the input when in runs into indices without an embedding mapping when a set of pretrained weight is used (i.e. are those initiated with some random vector, or just omitted, or any others)?

In addition, the documentation stated if padding_idx is assigned some value, the gradient for that value will always be zero. Does that imply the model can be trained correctly (and as efficient) without having to pack the model with pack_pad_sequences and then invert, given the zero gradients of the padded values?

Thank you and any suggestions are highly appreciated.