Best way to add features to LSTM input (embedding)

I am running an LSTM with input and output dim 100 (classes). I first embed the one-hot vector input into a dense one with nn.Embedding. Each timestep in the sequence is an index of the embedding matrix. During forward pass, the dense vector is retrieved with this index and used as input. Backprop adjusts weights of nodes in layers and embedding weights.

What if I want to add a feature to the original input? Imagine I have the same one-hot vector as above per timestep, and now I want to add a feature corresponding to another one-hot vector. I imagine concatenating both sparse vectors doesn’t work because I would have now two indices, correct?

Another option would be to embed each one-hot vector separately and then concatenate them before passing them to the rest of the network. This seems to be fairly standard. But how does learning work? How would I guarantee that both embeddings are learned properly during backprop? What do I need to do to guarantee proper learning for each embedding space? Or is this the wrong approach?

I would also appreciate pointers and any examples/papers (which I’m sure I’ve missed).


1 Like