Combine word embeddings + other features for sentence classification

For an LSTM model for sentence classification, I have text sequences for input. Using word embeddings, let’s say each token is mapped to 100D embeddings. Now if I also want to use other features, like part-of-speech, do I simply concatenate them and have 101D inputs? Doesn’t this diminish the effect of the POS tags?

Also, the embeddings are trainable and can be learned during training, while POS tags shouldn’t change. How is this possible if they’re just concatenated into one embedding layer?

I can’t seem to find a clear answer or an example describing the standard way to do this. It seems like some people concatenate, while others use two separate LSTMs then somehow combine those outputs?

Also wondering this - would appreciate you sharing if you’ve found anything more on this topic.

So I am kinda new to this but to the extend of my knowledge there might be multiple ways to go about it:

  1. Create word embeddings that contain the POS tag. i.e. pass the corpus to a POS tagger and feed a concatenation of the word and the pos tag as input to whatever method you use to produce word embeddings.

  2. Concatenate the word embeddings vector with another vector containing custom added features (such as POS tags)

  3. Create new word embeddings by having a dot product of the vector of the word embeddings and a vector containing other arbitrary features


Yeah, currently I’m using pretrained glove embeddings for words and mapping POS to another embedding layer, then concatenating the two before feeding into the rest of the model. So #2 of your suggestions.

For method #2, you’d need to train an embedding layer for the POS tags, yeah? Have you done this before?

The POS embeddings could be as simple as one hot encodings, or randomly initialized embeddings that are trainable

I’m trying to do trainable randomly initialized embeddings, but I’m getting the error I posted about here: Creating MTGP constraints failed