Mean of a Padded Sequence

Walsh · August 27, 2019, 1:56pm

Given a Tensor containing a set of padded sequences with shape B x P + 1, where the + 1 is a column containing the lens of the sequences, how can I find the mean of each sequence? Thanks in advance.

edit - Please note, that the padded elements are non-zero since they result from a previous embedding step

zhangguanheng66 · August 27, 2019, 2:14pm

You may consider EmbeddingBag. It will calculate the mean value without padding.

Walsh · August 27, 2019, 2:30pm

Thanks Guanheng, unfortunately my sequences are not fixed length so it seems I cannot use the EmbeddingBag approach. Do you have some other ideas?

Edit 1: This is there error I’m seeing that lead me to this assumption:
“ValueError: if input is 2D, then offsets has to be None, as input is treated is a mini-batch of fixed length sequences”
Edit 2: I was incorrect, you can use EmbeddingBag with variable length sequences. One needs to first flatten the sequences (to avoid the above 2D error) and provide offsets to the positions of the tensors