Input to LSTM as a matrix or similar

Hi, Im working on gender inference. have the data like this,
userid, text,
1, hello, world
1, how are you
…(up to 3000) gender:Female
2, whats up
2, Im here
…(up to 3000) gender:Male

The problem is that if I concat all of the texts for each user, it would be too long for LSTM to process. What I am thinking is to map each sentence into sentence embedding using bert and average them. In this way, each user is represented as a vector. However, this will generate a vector. How do I feed it into a LSTM model which takes a 3d tensor as input?

Well, when you use, for example, BERT to create a sentence embedding – with or with out avareging the embeddings for the same user – you do no longer have sequences. That’s the goal of a sentence embedding, to convert se sequence of words/tokens into a vector. Hence, there’s no need or even a meaningful way to feed this into an LSTM. Once you have the sentence embedding, you can feed it into any non-RNN network of linear or CNN layers.

On the other side, I don’t see any particular reason to treat all sentence of the same user as one “block”? For starters, you can simply treat each (sentence, gender) pair as an individual data item . Then you can use an RNN (LSTM or GRU) to train a binary classifier. It’s the most straightforward way to do, and would give if nothing else a good baseline for any more advanced methods you want to try.

Even if you create BERT to create sentence embeddings, why do you think it’s important to average the embeddings for all sentences of the same users. Again, I would start with using each (sentence_embedding, gender) as individual data item first.

Thanks Chris! I guess the reason why I wanna make a block is because for testing, using a text, gender pair, if a user has 1500 tweets, say, we could have maybe 800 males and 700 females. How do I deal with a situation like this? Just count?