Can we use pre-trained word embeddings for weight initialization in nn.Embedding?

wasiahmad · March 21, 2017, 1:20am

I want to use pre-trained word embeddings as initial weight vectors for embedding layer in a encoder model? How can I achieve this? For example, in Keras, we can pass a weight matrix as parameter to the embedding layer. Is there any similar way in PyTorch to do this?

ruotianluo · March 21, 2017, 1:31am

you can just assign the weight to the embedding layer. Like:

embed = nn.Embedding(num_embeddings, embedding_dim)
# pretrained_weight is a numpy matrix of shape (num_embeddings, embedding_dim)
embed.weight.data.copy_(torch.from_numpy(pretrained_weight))

ShawnGuo · March 22, 2017, 12:57pm

I usually use the following way, which is better?

#embeddings is a torch tensor.
embedding = nn.Embedding(embeddings.size(0), embeddings.size(1))
embedding.weight = nn.Parameter(embeddings)

apaszke · March 22, 2017, 6:48pm

I’d say they’re both ok.

xinyadu · April 3, 2017, 11:33pm

And how can we keep the embedding matrix fixed during training? I didn’t find that in the doc.

cyyyyc123 · April 4, 2017, 12:08am

For @ruotianluo’s answer, you can try

embed.weight.requires_grad = False

to freeze the parameter of it.

xinyadu · April 6, 2017, 8:21am

But when I initialize optimizer, i got “ValueError: optimizing a parameter that doesn’t require gradients”

xwgeng · April 6, 2017, 10:36am

You can use filter to remove the parameters that doesn’t require gradients

parameters = filter(lambda p: p.requires_grad, net.parameters())

Navneet_M_Kumar · June 9, 2017, 7:12am

How can we specifically use glove vectors and mainly in the encoder - decoder model ? Not able to understand.

tushargupta14 · June 25, 2017, 10:11am

@Navneet_M_Kumar try intialising the vectors which you need. More, specifically create a corpus vocabulary and retrieve your pre-trained embeddings to a numpy matrix mapped with an id. This matrix can be passed on to the nn.Embedding layer mentioned above.

sritvik · October 1, 2017, 2:27pm

What if I want to use sentence embedding as a whole and not word vectors. Suppose I have the sentence embeddings ready, do I create a (number of sentences X sentence embedding dimension) matrix and map it to each sentence id? And pass this matrix to the embedding layer and call the sentence ids in the forward function?

Is this approach right? I’m trying to perform a type of sentence classification.

mfa · October 4, 2017, 3:36pm

Regarding the use of pre-trained word embeddings, the indexes of words that are passed to the Embedding layer should be equivalent to their index in the pre-trained embedding (the numpy matrix), right?

shirdu · October 4, 2017, 9:17pm

I know I’m missing something.
embed = nn.Embedding(num_embeddings, embedding_dim) # this creates a layer
embed.weight.data.copy_(torch.from_numpy(pretrained_weight)) # this provides the values

I don’t understand how the last operation inserts a dict from which you can, given a word, retrieve its vector. It seems like we provide a matrix with out what each vector is mapped to. Is this the case or that this matrix’s first column is of the word the following raw vector belongs to (or alternatively, columns)?
How does it know the mappings?

colesbury · October 4, 2017, 9:20pm

Yes, it’s a matrix. Each row is the embedding for a word. You’ll also want a dictionary: mapping of words (strings) to integers 0, 1, …, N

shirdu · October 4, 2017, 11:06pm

Thanks for replying colesbury!
That makes a lot of sense now, so the layer would know that row number 0 is associated with the word that is mapped to 0. Now the puzzle is solved.
How do I let the embedding know about the dict?
I wasn’t sure what is the code for that?

shirdu · October 5, 2017, 1:03am

I’ve created this:
def loadEmbd():
upload a glove embedding dict file to generate the input output.
glovefname = '…/glove/glove.6B.50d.txt’
gmat = []
gdict2 = OrderedDict()
for h, line in enumerate(open(glovefname, ‘r’).readlines()):
line = line.strip()
line = line.split()
word = line[0]
gdict2[word] = h
vector = [float(item) for item in line[1:]]
gmat.append(vector)
return gmat, gdict2

And after constructing the model with the embedding mat (gmat)
I tried to load the dict with:
model.load_state_dict(gdict2)

but it said:
KeyError: ‘unexpected key “the” in state_dict’

tushargupta14 · October 5, 2017, 5:44am

Yes, the indexes of the words in your vocabulary should be same as your indexes in your embedding numpy matrix. This helps the embedding layer to map the word tokens to vectors.

austin · October 14, 2017, 11:24am

I wrote up a simple working code example for loading glove vectors while working on a related project. Notebook is here. It’s living in my project repo at the moment but I would be happy to split it off if people find it helpful.

sidedishes · April 9, 2018, 10:11pm

Sorry for the bump - what might be the easiest way to have only parts of the embedding matrix frozen? For example, if I wanted to use pre-trained embeddings but for certain words assign a special custom token whose embedding I want to train?

jlquinn · April 11, 2018, 3:45pm

Two ways I’d see doing it:

First to have two separate embeddings. One embedding learns, the other uses pre-trained weights. Select the embedding to use depending on the value of the input.

The other approach would be to overwrite the pretrained parts of the embedding at the beginning of each batch to undo the results of the previous optimizer step.