LSTM: Why does enforce_sorted change the size of the expected hidden state?

I have several questions about the argument enforce_sorted in the function nn.utils.rnn.pack_padded_sequence

  1. Depending on this argument the expected hidden state tensors given to lstm are exepeted to be either of dimension batch_size, sequence length, hidden_size or batch_size, input_size, hidden_size.
    I don’t understand when the first dimension is useful and what it has to do if the packed sequences are sorted by descending length.
  2. Why is the information about the actual length of the sequence relevant, they are padded to the same length already?
  3. Why are unsorted sequences not ONNX compatible?

Thank you very much!