For an LSTM model for sentence classification, I have text sequences for input. Using word embeddings, let’s say each token is mapped to 100D embeddings. Now if I also want to use other features, like part-of-speech, do I simply concatenate them and have 101D inputs? Doesn’t this diminish the effect of the POS tags?
Also, the embeddings are trainable and can be learned during training, while POS tags shouldn’t change. How is this possible if they’re just concatenated into one embedding layer?
I can’t seem to find a clear answer or an example describing the standard way to do this. It seems like some people concatenate, while others use two separate LSTMs then somehow combine those outputs?
So I am kinda new to this but to the extend of my knowledge there might be multiple ways to go about it:
Create word embeddings that contain the POS tag. i.e. pass the corpus to a POS tagger and feed a concatenation of the word and the pos tag as input to whatever method you use to produce word embeddings.
Concatenate the word embeddings vector with another vector containing custom added features (such as POS tags)
Create new word embeddings by having a dot product of the vector of the word embeddings and a vector containing other arbitrary features
Yeah, currently I’m using pretrained glove embeddings for words and mapping POS to another embedding layer, then concatenating the two before feeding into the rest of the model. So #2 of your suggestions.