Attention for Sentence Classification

I have some conceptual problem understanding how can we define a variable length attention network since each batch passed through a RNN has a different sequence length.

Do we need a to pad all the batches such that the length of each batch is the maximum length of sentence possible?

If this is the case then how can I do it with torchtext?