I want to use pre-trained word embeddings as initial weight vectors for embedding layer in a encoder model? How can I achieve this? For example, in Keras, we can pass a weight matrix as parameter to the embedding layer. Is there any similar way in PyTorch to do this?
you can just assign the weight to the embedding layer. Like:
embed = nn.Embedding(num_embeddings, embedding_dim)
# pretrained_weight is a numpy matrix of shape (num_embeddings, embedding_dim)
embed.weight.data.copy_(torch.from_numpy(pretrained_weight))
I usually use the following way, which is better?
#embeddings is a torch tensor.
embedding = nn.Embedding(embeddings.size(0), embeddings.size(1))
embedding.weight = nn.Parameter(embeddings)
Iâd say theyâre both ok.
And how can we keep the embedding matrix fixed during training? I didnât find that in the doc.
For @ruotianluoâs answer, you can try
embed.weight.requires_grad = False
to freeze the parameter of it.
But when I initialize optimizer, i got âValueError: optimizing a parameter that doesnât require gradientsâ
You can use filter
to remove the parameters that doesnât require gradients
parameters = filter(lambda p: p.requires_grad, net.parameters())
How can we specifically use glove vectors and mainly in the encoder - decoder model ? Not able to understand.
@Navneet_M_Kumar try intialising the vectors which you need. More, specifically create a corpus vocabulary and retrieve your pre-trained embeddings to a numpy matrix mapped with an id. This matrix can be passed on to the nn.Embedding layer mentioned above.
What if I want to use sentence embedding as a whole and not word vectors. Suppose I have the sentence embeddings ready, do I create a (number of sentences X sentence embedding dimension) matrix and map it to each sentence id? And pass this matrix to the embedding layer and call the sentence ids in the forward function?
Is this approach right? Iâm trying to perform a type of sentence classification.
Regarding the use of pre-trained word embeddings, the indexes of words that are passed to the Embedding layer should be equivalent to their index in the pre-trained embedding (the numpy matrix), right?
I know Iâm missing something.
embed = nn.Embedding(num_embeddings, embedding_dim) # this creates a layer
embed.weight.data.copy_(torch.from_numpy(pretrained_weight)) # this provides the values
I donât understand how the last operation inserts a dict from which you can, given a word, retrieve its vector. It seems like we provide a matrix with out what each vector is mapped to. Is this the case or that this matrixâs first column is of the word the following raw vector belongs to (or alternatively, columns)?
How does it know the mappings?
Yes, itâs a matrix. Each row is the embedding for a word. Youâll also want a dictionary: mapping of words (strings) to integers 0, 1, âŚ, N
Thanks for replying colesbury!
That makes a lot of sense now, so the layer would know that row number 0 is associated with the word that is mapped to 0. Now the puzzle is solved.
How do I let the embedding know about the dict?
I wasnât sure what is the code for that?
Iâve created this:
def loadEmbd():
upload a glove embedding dict file to generate the input output.
glovefname = 'âŚ/glove/glove.6B.50d.txtâ
gmat = []
gdict2 = OrderedDict()
for h, line in enumerate(open(glovefname, ârâ).readlines()):
line = line.strip()
line = line.split()
word = line[0]
gdict2[word] = h
vector = [float(item) for item in line[1:]]
gmat.append(vector)
return gmat, gdict2
And after constructing the model with the embedding mat (gmat)
I tried to load the dict with:
model.load_state_dict(gdict2)
but it said:
KeyError: âunexpected key âtheâ in state_dictâ
Yes, the indexes of the words in your vocabulary should be same as your indexes in your embedding numpy matrix. This helps the embedding layer to map the word tokens to vectors.
I wrote up a simple working code example for loading glove vectors while working on a related project. Notebook is here. Itâs living in my project repo at the moment but I would be happy to split it off if people find it helpful.
Sorry for the bump - what might be the easiest way to have only parts of the embedding matrix frozen? For example, if I wanted to use pre-trained embeddings but for certain words assign a special custom token whose embedding I want to train?
Two ways Iâd see doing it:
First to have two separate embeddings. One embedding learns, the other uses pre-trained weights. Select the embedding to use depending on the value of the input.
The other approach would be to overwrite the pretrained parts of the embedding at the beginning of each batch to undo the results of the previous optimizer step.