Finetuning embeddings - nn.Embedding vs. nn.Embedding.from_pretrained

I have been working with pretrained embeddings (Glove) and would like to allow these to be finetuned. I currently use embeddings like this:

    word_embeddingsA = nn.Embedding(vocab_size, embedding_length)
    word_embeddingsA.weight = nn.Parameter(TEXT.vocab.vectors, requires_grad=False)

Should I simply set requires_grad=True to allow the embeddings to be trained? Or should I do something like this

    word_embeddingsA = nn.Embedding.from_pretrained(TEXT.vocab.vectors, freeze=False)

Are these equivalent, and do I have a way to check that the embeddings are getting trained?

The approaches should yield the same result (if you use requires_grad=True in the first approach).
To make sure this layer is trained, you could check the gradients after the backward call via:


and you should see valid gradients.
If you are seeing None as the return value, the computation graph might have been detached at some point.

1 Like

Ok thanks. If I now want to infer using the trained weights can I still do

test_sen1 = TEXT.preprocess(test_sen1)
test_sen1 = [[TEXT.vocab.stoi[x] for x in test_sen1]]
test_sen1 = np.asarray(test_sen1)
test_sen1 = torch.LongTensor(test_sen1)
test_tensor1 = Variable(test_sen1, volatile=True)
output = model(test_tensor1,1)

I suppose this may still be ok as the TEXT object is just supplying the indices?

Iā€™m not familiar enough with torchtext unfortunately.
Could you test this code snippet using some training examples and run a sanity check to see, if the predictions are expected?

PS: Variables are deprecated since PyTorch 0.4, so you can use tensors now. :wink:
To save memory, wrap the inference code in a with torch.no_grad() block.