How BiLSTM works with padding/pack_padded_sequence

I understand how padding and pack_padded_sequence work, but I have a question about how it’s applied to Bidirectional.

  1. Does the BiLSTM (from nn.LSTM) automatically applied the inverse of the sequence (also in case of using the pad_packed_sequence)?

  2. If yes, so does the padding affect the first-last timestep?

    for example : seq1=[a, b, c, d, e], seq2=[x, y, z]
    and after padding : seq1=[a, b, c, d, e], seq2=[x, y, z, 0, 0]
    If we input the seq2, this means that BiLSTM
          - take input of [x, y, z, 0, 0] and [0, 0, z, y, z]
      or  - take input of [x, y, z, 0, 0] and [z, y, x, 0, 0]
    

Would you please help me clarify these points? Thank you very much

As far as I know, case 2 is the correct way.
Edit: I remove the misleading expression, please refer below discussion.

Is it because of lengths parameters that we use while packing to make the padding mask not be included in a network input?

I think the length information is even not required.
Note that we pad the short sequences with zeros. RNN with input hidden=None (the first frame) would actually creates hidden=torch.zeros(...). In this case, [z, y, x, 0, 0] and [0, 0, z, y, x] just produce same results.
But for theoretically, an inverse ([x, y, z, 0, 0] -> [0, 0, z, y, x]) and a shift ([0, 0, z, y, x] -> [z, y, x, 0, 0]) are required in case paddings are not zeros.