Hello PyTorch community,

I would like to average the outputs of GRU/LSTM. The input sequences have different lengths, so I use packing. With the following simple code, what is the best/efficient way to get the outputs (output of the RNN, not the hidden states h) and take their mean? Either from the packed output or from the padded output. I can use a loop and the sequence lengths to achieve that, but that would be very slow. I am in search of an efficient matrix solution.

```
class RNNText(nn.Module):
def __init__(self, vocab_size, word_dim=512, embed_size=512, num_layers=1):
super(RNNText, self).__init__()
self.embed_size = embed_size
self.embed = nn.Embedding(vocab_size, word_dim)
self.rnn = nn.GRU(word_dim, embed_size, num_layers, batch_first=True)
self.fc = nn.Linear(embed_size, embed_size)
def forward(self, x, lengths):
x = self.embed(x)
packed = pack_padded_sequence(x, lengths, batch_first=True)
# Forward propagate RNN
out_packed, h = self.rnn(packed)
# padded = pad_packed_sequence(out_packed, batch_first=True)
out = torch.mean(???)
out = self.fc(out)
out = F.normalize(out)
return out
```