Finetuning embeddings - nn.Embedding vs. nn.Embedding.from_pretrained

I have been working with pretrained embeddings (Glove) and would like to allow these to be finetuned. I currently use embeddings like this:

    word_embeddingsA = nn.Embedding(vocab_size, embedding_length)
    word_embeddingsA.weight = nn.Parameter(TEXT.vocab.vectors, requires_grad=False)

Should I simply set requires_grad=True to allow the embeddings to be trained? Or should I do something like this

    word_embeddingsA = nn.Embedding.from_pretrained(TEXT.vocab.vectors, freeze=False)

Are these equivalent, and do I have a way to check that the embeddings are getting trained?

The approaches should yield the same result (if you use requires_grad=True in the first approach).
To make sure this layer is trained, you could check the gradients after the backward call via:

print(model.word_embeddings.weight.grad)

and you should see valid gradients.
If you are seeing None as the return value, the computation graph might have been detached at some point.

1 Like

Ok thanks. If I now want to infer using the trained weights can I still do

test_sen1 = TEXT.preprocess(test_sen1)
test_sen1 = [[TEXT.vocab.stoi[x] for x in test_sen1]]
test_sen1 = np.asarray(test_sen1)
test_sen1 = torch.LongTensor(test_sen1)
test_tensor1 = Variable(test_sen1, volatile=True)
output = model(test_tensor1,1)

I suppose this may still be ok as the TEXT object is just supplying the indices?

I’m not familiar enough with torchtext unfortunately.
Could you test this code snippet using some training examples and run a sanity check to see, if the predictions are expected?

PS: Variables are deprecated since PyTorch 0.4, so you can use tensors now. :wink:
To save memory, wrap the inference code in a with torch.no_grad() block.