How not to consider padding during training?

Hi, I have a dataset with 2D samples of shape (X, 100) each, where X is an arbitrary number.
I want to feed a neural network using batches, but the samples can have variable sequence length and for this reason I used pad_sequence method to create the batch tensor with padded samples (of shape (batch_size, sequence_length, 100)).
Example (here each sample has shape (x, 3)):

L = [tensor([[ 0.7900, -0.7206, 1.2947]]), tensor([[ 0.5230, -0.3547, -1.8458], [ 0.2626, 0.0604, 0.1883]])]

L is the list of my samples, as you can see, the first has shape (1,3) and the second has shape (2,3)
Applying pad_sequence method the result is:

tensor([[[ 0.7900, -0.7206, 1.2947], [ 0.0000, 0.0000, 0.0000]], [[ 0.5230, -0.3547, -1.8458], [ 0.2626, 0.0604, 0.1883]]])

The neural network has two nested networks which results are summed together and concatenated to output a score. Does the network’s result depend on padding? How can not consider it?
I know that Keras has a masking layer, but I don’t know how to implement it on PyTorch.
Thank you in advance.