No, word
is not passed to the embedding layer. I don’t need to, since I already pass the word embedding as input to my network. In other words, I load the wiki2vec model using gensim in another script, convert the words to embeddings, keep the other three tags ( pos_tag
, dep_tag
and dir
) the exact same, convert them to input tensors in the format needed, and then pass it as input to the network.
I need to do this because the embeddings file is large, and doesn’t fit into my network. But I do have a high RAM server at my disposal, which I use for loading the model using gensim and converting word
to its embedding. Hence my reason for running 2 scripts, one that converts text into all these tags and converts the word into embeddings; and another script that contains the actual LSTM model.
The other tags represent: pos_tag
as is self explanatory, represents the POS tag of a word. dep_tag
represents the dependency label of the word. dir
represents the direction of the path and can take two values, pos
or neg
.
These tags are passed as indices to the model. In other words, I construct a set of all dependency/POS tags and give each an index. There is an nn.Embedding layer in my network which takes these indices and converts them into embeddings of a specified dimension. Obviously, since they aren’t actual English words, I don’t load pretrained embeddings for these tags. I don’t set require_grad
to False
here, they get updated as the model trains.
The only place where require_grad
is set to False
is when word embeddings need to be calculated, since here I am loading a pretrained embeddings file. But now, I am removing the nn.Embedding layer for the word
and passing it as input to the model.
Let me just give an example as well, to make my point clear.
My input is of the form (‘permit’, ‘NN’, ‘obj’, ‘pos’). Here “permit” is the word, “NN” the POS tag, “obj” the dependency tag and “pos” the direction (“pos” is short for positive).
This tuple is then converted to indices like, say, (77, 3, 12, 1). So, here “permit” is the 77th word in my vocabulary, “NN” is the 3rd POS tag, “obj” is the 12th dependency tag and “pos” is 1st direction tag.
This was the earlier format of the input.
Now instead of (77, 3, 12, 1) I have: (tuple(embedding), 3, 12, 1) where embedding is the 300-dimensional word embedding of “permit”. This is input to the network.
For pos_tag, dep_tag and dir, there are 3 separate nn.Embedding layers that are initialized randomly, with weight_grad set as True. These layers take these indices (3, 12 and 1) and output the corresponding embeddings.