Batch training in LSTM with variable lengths to get final (hn, cn)

Hi, I have a sequence input: input.size()=[17, 1, 64] #(seq_len, batch, channel), and use nn.LSTM(64, 16, bidirectional=True) to get output, (hn, cn).

In such case, output.size()=[17, 1, 32], hn.size()=[2, 1, 16], cn.size()=[2, 1, 16]. I concat (hn, cn) as final feature for next application, which works well.

When there exists multiple inputs with variable lengths, I try to use pad_sequence and pack_padded_sequence to make a batch, i.e., a PackedSequence object. The model forwards successfully. However, when I using

h_n.view(num_layers, num_directions, batch, hidden_size).permute(2,0,1,3).contiguous().view(batch, -1)

to obtain final feature for each input sequence, the next applications get poor performance.

So, here the questions:
(1) The (hn, cn) represent the final time step for the entire batch? or exactly represent the final time step for each variable seq?
(2) If for the entire batch: how can I get the accurate final status (h, c) for each variable seq?
(3) If already for each seq: why the performance drops? Is there other problems which I should check?