About pack_padded_sequence, pad_packed_sequence

Hello everyone,
I have a question as follows: Can I use pack_padded_sequence and pad_packed_sequence functions when working with Transformer and MultiHeadAttention classes?
The shape of my input data is [batch_size, num_sequences, max_sequence_length]. At dim=1, there are some sequences consisting entirely of pad values.
Example:
Input data :
[[[3, 2, 4, 5],
[2, 3, 1, 1],
[4, 3, 9, 1]],
[[6, 3, 2, 5],
[1, 1, 1, 1],
[1, 1, 1, 1]]]
(batch_size = 2, num_sequences = 3, max_sequence_length=4, padding value = 1)

I want to remove those sequences before doing the forwarding of the above classes.
Thanks for all the support.