word is not passed to the embedding layer. I don’t need to, since I already pass the word embedding as input to my network. In other words, I load the wiki2vec model using gensim in another script, convert the words to embeddings, keep the other three tags (
dir ) the exact same, convert them to input tensors in the format needed, and then pass it as input to the network.
I need to do this because the embeddings file is large, and doesn’t fit into my network. But I do have a high RAM server at my disposal, which I use for loading the model using gensim and converting
word to its embedding. Hence my reason for running 2 scripts, one that converts text into all these tags and converts the word into embeddings; and another script that contains the actual LSTM model.
The other tags represent:
pos_tag as is self explanatory, represents the POS tag of a word.
dep_tag represents the dependency label of the word.
dir represents the direction of the path and can take two values,
These tags are passed as indices to the model. In other words, I construct a set of all dependency/POS tags and give each an index. There is an nn.Embedding layer in my network which takes these indices and converts them into embeddings of a specified dimension. Obviously, since they aren’t actual English words, I don’t load pretrained embeddings for these tags. I don’t set
False here, they get updated as the model trains.
The only place where
require_grad is set to
False is when word embeddings need to be calculated, since here I am loading a pretrained embeddings file. But now, I am removing the nn.Embedding layer for the
word and passing it as input to the model.
Let me just give an example as well, to make my point clear.
My input is of the form (‘permit’, ‘NN’, ‘obj’, ‘pos’). Here “permit” is the word, “NN” the POS tag, “obj” the dependency tag and “pos” the direction (“pos” is short for positive).
This tuple is then converted to indices like, say, (77, 3, 12, 1). So, here “permit” is the 77th word in my vocabulary, “NN” is the 3rd POS tag, “obj” is the 12th dependency tag and “pos” is 1st direction tag.
This was the earlier format of the input.
Now instead of (77, 3, 12, 1) I have: (tuple(embedding), 3, 12, 1) where embedding is the 300-dimensional word embedding of “permit”. This is input to the network.
For pos_tag, dep_tag and dir, there are 3 separate nn.Embedding layers that are initialized randomly, with weight_grad set as True. These layers take these indices (3, 12 and 1) and output the corresponding embeddings.