Hi! As far as I understand, the strength of PyTorch is supposed to be that it works with dynamic computational graphs. In the context of NLP, that means that sequences with variable lengths do not necessarily need to be padded to the same length. But, if I want to use PyTorch DataLoader, I need to pad my sequences anyway because the DataLoader only takes tensors - given that me as a total beginner does not want to build some customized collate_fn.

Now this makes me wonder - doesn’t this wash away the whole advantage of dynamic computational graphs in this context?

Also, if I pad my sequences to feed it into the DataLoader as a tensor with many zeros as padding tokens at the end (in the case of word ids), will it have any negative effect on my training since PyTorch may not be optimized for computations with padded sequences (since the whole premise is that it can work with variable sequence lengths in the dynamic graphs), or does it simply not make any difference?

Thanks