Hi there,
I’have build a LSTM network to process time sequences.
Each of my sequences are one-hot encoded on a 13 embedding dim space (pitch chroma + ‘pad’)
Each of my sequences are of same length (padded).
After putting them in batches of size : bsz, I end with 3D Tensors of size: (seq_len, bsz, embedding_size).
Once a batch is fed into the LSTM, i end with outputs of size (seq_len, bsz, nb_tokens)
What I want: feed this output into a layer that performs a weighted average over a single dimension, that is the time dimension in order to learn weights for each time and end with a final output that should be of dimension (bsz, nb_tokens)
I haven’t been able to find such layers, I believe it can be done concatenating along some dimension and applying some average pooling layer with some parameters, but for the moment I haven’t been able to formulate it in a correct way.
Do you guys have any idea or knowledge of such layers ?
Thanks in advance!