How to calculate a word vector average (for batched inputs)?

Sorry for the noob question, but I can’t find an answer anywhere online.

I have a batch of zero padded inputs, representing word vector IDs. I can create an embedding layer and look up those word vector values.

How do I do something simple like add the word vectors for each row, up to word vector X, where X is different for every row?

Even something like torch.nonzero(t) returns a “not implemented for type Variable” error.

The Numpy function would be

Is there a way to do this in PyTorch, without batch size == 1, which would backprop loss back to my embedding layer? Thanks!

I would masked_fill_ with zeros for the entries past the word vectors (if they aren’t already filled with zeros) and then sum along the whole axis. If you want an average, you can then divide by the lengths tensor suitably expand_ased.

1 Like

Thanks! Will check out those functions. Of course the brute force way would be to subtract out the non-masked values, etc…

I did it through several repeat operations – probably not the most efficient but it’s correct and easy to follow. Will rewrite it better as needed. Useful function.