Hi there,

I’have build a LSTM network to process time sequences.

Each of my sequences are one-hot encoded on a 13 embedding dim space (pitch chroma + ‘pad’)

Each of my sequences are of same length (padded).

After putting them in batches of size : bsz, I end with 3D Tensors of size: (seq_len, bsz, embedding_size).

Once a batch is fed into the LSTM, i end with outputs of size (seq_len, bsz, nb_tokens)

What I want: feed this output into a layer that performs a weighted average over a single dimension, that is the time dimension in order to learn weights for each time and end with a final output that should be of dimension (bsz, nb_tokens)

I haven’t been able to find such layers, I believe it can be done concatenating along some dimension and applying some average pooling layer with some parameters, but for the moment I haven’t been able to formulate it in a correct way.

Do you guys have any idea or knowledge of such layers ?

Thanks in advance!