I have a dilemma, I’m building a sentence classifier using two different models FFNN and LSTM. I used EmbeddingBag with offsets with FFNN and Embedding/pack padded sequence with RNN. I have questions:
- How could I add GloVe like embeddings using EmbeddingBag for randomly initialized embeddings?
- Is it really necessary to use pack padded sequence to an RNN model, if I use EmbeddingBag, the offsets does the same job as the output of the EmbeddingBag is more uniform?
First, if you want to specify the embedding weights of embedding bags, you can use the following code
eb = torch.nn.EmbeddingBag(glove_vocabulary_size, glove_embedding_dim)
eb.weight.data = glove_embeding_tensor # the embedding tensor loaded from glove embedding files
Second, passing a sequence to an RNN model may increase your classifier’s performance (especially when you have sufficient training data) compared to simply reducing the embedding tensors (like EmbeddingBag), since it involves more trainable parameters.
FFNN and LSTM are very different architectures.
nn.EmbeddingBag gives you some aggregate (
max) over all relevant word embeddings – that is, all embeddings for the words contained in your sentence. This means that this aggregated embedding vector does no longer capture any sequence information. It’s therefore no longer a meaningful input for a LSTM/GRU.
An LSTM/GRU expects a sequence of vectors – each vector is an embedding for an individual word/token in your sentences – while
nn.EmbeddingBag gives you a single vector for a whole sentence. Of course, this allows you to use this vector as input for an FFNN since those cannot handle sequences.
More specifically, using
nn.EmbeddingBag will give you the same embedding vector for the following two sentences:
- “the movie was funny but not great”
- “the movie was not funny but great”
Both sentences contain the same words just in a different order.
If you want to consider word order, you need to use LSTMs/GRUs/RNNs. And in this case, you need to handle sentences/sequences of different lengths, at least as long as you want to use batches of size larger than 1. Padding is one way to do this.
nn.LSTM do not meaningfully mix.