Can we use pre-trained word embeddings for weight initialization in nn.Embedding?


#3

I usually use the following way, which is better?

#embeddings is a torch tensor.
embedding = nn.Embedding(embeddings.size(0), embeddings.size(1))
embedding.weight = nn.Parameter(embeddings)

(Adam Paszke) #4

I’d say they’re both ok.


(xd) #5

And how can we keep the embedding matrix fixed during training? I didn’t find that in the doc.


(Yangyu Chen) #6

For @ruotianluo’s answer, you can try

embed.weight.requires_grad = False

to freeze the parameter of it.


(xd) #7

But when I initialize optimizer, i got “ValueError: optimizing a parameter that doesn’t require gradients”


#8

You can use filter to remove the parameters that doesn’t require gradients

parameters = filter(lambda p: p.requires_grad, net.parameters())


(Navneet M Kumar) #10

How can we specifically use glove vectors and mainly in the encoder - decoder model ? Not able to understand.


(Tushar Gupta) #11

@Navneet_M_Kumar try intialising the vectors which you need. More, specifically create a corpus vocabulary and retrieve your pre-trained embeddings to a numpy matrix mapped with an id. This matrix can be passed on to the nn.Embedding layer mentioned above.


#12

What if I want to use sentence embedding as a whole and not word vectors. Suppose I have the sentence embeddings ready, do I create a (number of sentences X sentence embedding dimension) matrix and map it to each sentence id? And pass this matrix to the embedding layer and call the sentence ids in the forward function?

Is this approach right? I’m trying to perform a type of sentence classification.


(Meghdad Farahmand) #13

Regarding the use of pre-trained word embeddings, the indexes of words that are passed to the Embedding layer should be equivalent to their index in the pre-trained embedding (the numpy matrix), right?


(shirdu) #14

I know I’m missing something.
embed = nn.Embedding(num_embeddings, embedding_dim) # this creates a layer
embed.weight.data.copy_(torch.from_numpy(pretrained_weight)) # this provides the values

I don’t understand how the last operation inserts a dict from which you can, given a word, retrieve its vector. It seems like we provide a matrix with out what each vector is mapped to. Is this the case or that this matrix’s first column is of the word the following raw vector belongs to (or alternatively, columns)?
How does it know the mappings?


(colesbury) #15

Yes, it’s a matrix. Each row is the embedding for a word. You’ll also want a dictionary: mapping of words (strings) to integers 0, 1, …, N


(shirdu) #16

Thanks for replying colesbury! :slight_smile:
That makes a lot of sense now, so the layer would know that row number 0 is associated with the word that is mapped to 0. Now the puzzle is solved.
How do I let the embedding know about the dict?
I wasn’t sure what is the code for that?


(shirdu) #17

I’ve created this:
def loadEmbd():
upload a glove embedding dict file to generate the input output.
glovefname = '…/glove/glove.6B.50d.txt’
gmat = []
gdict2 = OrderedDict()
for h, line in enumerate(open(glovefname, ‘r’).readlines()):
line = line.strip()
line = line.split()
word = line[0]
gdict2[word] = h
vector = [float(item) for item in line[1:]]
gmat.append(vector)
return gmat, gdict2

And after constructing the model with the embedding mat (gmat)
I tried to load the dict with:
model.load_state_dict(gdict2)

but it said:
KeyError: ‘unexpected key “the” in state_dict’


(Tushar Gupta) #18

Yes, the indexes of the words in your vocabulary should be same as your indexes in your embedding numpy matrix. This helps the embedding layer to map the word tokens to vectors.


(Austin) #19

I wrote up a simple working code example for loading glove vectors while working on a related project. Notebook is here. It’s living in my project repo at the moment but I would be happy to split it off if people find it helpful.


#20

Sorry for the bump - what might be the easiest way to have only parts of the embedding matrix frozen? For example, if I wanted to use pre-trained embeddings but for certain words assign a special custom token whose embedding I want to train?


#21

Two ways I’d see doing it:

First to have two separate embeddings. One embedding learns, the other uses pre-trained weights. Select the embedding to use depending on the value of the input.

The other approach would be to overwrite the pretrained parts of the embedding at the beginning of each batch to undo the results of the previous optimizer step.


#22

Thanks so much. Do you know if there’s any easy way to vectorize this kind of selection operation?


#23

If you have a mask of the cells that should be frozen, and two full embedding matrices, one frozen and one dyamic, you could write:

dynamic = dynamic * mask + frozen

Frozen cells contain a value for frozen parameters, 0 elsewhere. Mask contains 0 wherever a frozen paramter should be used, 1 elsewhere.

You can build the mask and frozen matrix during initialization.