Please help me with pack_padded_sequence and pad_packed_sequence

I have the following code with LSTM and pack_padded_sequence and pad_packed_sequence:

input = torch.tensor([[3, 4, 5, 5], [5, 6, 3, 0], [1, 0, 0, 0]], dtype=torch.long)
input
tensor([[ 3, 4, 5, 5],
[ 5, 6, 3, 0],
[ 1, 0, 0, 0]])

embeds = nn.Embedding(10, 3)
embeded_input = embeds(input)
embeded_input
tensor([[[ 1.4851, 1.3424, -0.2184],
[-1.3813, -0.8538, -1.0485],
[-0.7719, -1.1383, 0.0262],
[-0.7719, -1.1383, 0.0262]],

    [[-0.7719, -1.1383,  0.0262],
     [-0.8364,  1.2809, -0.0589],
     [ 1.4851,  1.3424, -0.2184],
     [-1.4621,  0.2334,  0.5030]],

    [[ 0.7615, -1.4556, -0.5909],
     [-1.4621,  0.2334,  0.5030],
     [-1.4621,  0.2334,  0.5030],
     [-1.4621,  0.2334,  0.5030]]])

lengths = torch.tensor([4, 3, 1], dtype=torch.long)
lengths
tensor([ 4, 3, 1])

packed_input = pack_padded_sequence(embeded_input, lengths, batch_first=True)
lstm = nn.LSTM(3, 6, batch_first=True)
lstm_out, (hn, cn) = lstm(packed_input)
lstm_out
PackedSequence(data=tensor([[ 0.0794, -0.0520, -0.0133, -0.0229, 0.1579, 0.1901],
[ 0.0462, 0.1229, -0.1156, -0.0611, -0.2268, -0.0666],
[ 0.2431, 0.1698, 0.0972, 0.0375, -0.1374, -0.0993],
[ 0.0808, 0.1059, -0.1316, -0.1201, -0.2913, 0.0543],
[-0.0808, 0.1496, -0.2175, -0.2547, -0.2862, 0.0695],
[ 0.0833, 0.1712, -0.1700, -0.1216, -0.3206, -0.0491],
[ 0.0375, 0.0346, -0.1532, -0.2162, -0.0416, 0.1792],
[ 0.0870, 0.1947, -0.1940, -0.1209, -0.3583, -0.0994]]), batch_sizes=tensor([ 3, 2, 2, 1]))

out = pad_packed_sequence(lstm_out, batch_first=True)
out
(tensor([[[ 0.0794, -0.0520, -0.0133, -0.0229, 0.1579, 0.1901],
[ 0.0808, 0.1059, -0.1316, -0.1201, -0.2913, 0.0543],
[ 0.0833, 0.1712, -0.1700, -0.1216, -0.3206, -0.0491],
[ 0.0870, 0.1947, -0.1940, -0.1209, -0.3583, -0.0994]],

    [[ 0.0462,  0.1229, -0.1156, -0.0611, -0.2268, -0.0666],
     [-0.0808,  0.1496, -0.2175, -0.2547, -0.2862,  0.0695],
     [ 0.0375,  0.0346, -0.1532, -0.2162, -0.0416,  0.1792],
     [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],

    [[ 0.2431,  0.1698,  0.0972,  0.0375, -0.1374, -0.0993],
     [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
     [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
     [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]), tensor([ 4,  3,  1]))

type(out)
<class ‘tuple’>

output = out[0]
output
tensor([[[ 0.0794, -0.0520, -0.0133, -0.0229, 0.1579, 0.1901],
[ 0.0808, 0.1059, -0.1316, -0.1201, -0.2913, 0.0543],
[ 0.0833, 0.1712, -0.1700, -0.1216, -0.3206, -0.0491],
[ 0.0870, 0.1947, -0.1940, -0.1209, -0.3583, -0.0994]],

    [[ 0.0462,  0.1229, -0.1156, -0.0611, -0.2268, -0.0666],
     [-0.0808,  0.1496, -0.2175, -0.2547, -0.2862,  0.0695],
     [ 0.0375,  0.0346, -0.1532, -0.2162, -0.0416,  0.1792],
     [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],

    [[ 0.2431,  0.1698,  0.0972,  0.0375, -0.1374, -0.0993],
     [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
     [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
     [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]])

len = out[1]
len
tensor([ 4, 3, 1])

As above, input shape(which is padded with zeros) is (batch_size * sequence_length * embedding_dim), output shape is (batch_size * sequence_length * hidden_size). For every sequence in the batch, i want to compute the average output hidden state for all timesteps. As you know, each sequence in the batch has different lengths, so we can’t simply get it by using torch.mean(output, dim=1). That’s wrong, because each sequence has different timesteps.

How can i do that in a simple way? Could anybody help me? Thanks.
By the way, I am a deep learning practitioner from China.

I came up the question because i want to get the mean output hidden state for every sequence in the batch, and the sequence lengths varies in the batch.