Finetuning embeddings - nn.Embedding vs. nn.Embedding.from_pretrained

Jeremy_Rutman · June 22, 2020, 11:55am

I have been working with pretrained embeddings (Glove) and would like to allow these to be finetuned. I currently use embeddings like this:

    word_embeddingsA = nn.Embedding(vocab_size, embedding_length)
    word_embeddingsA.weight = nn.Parameter(TEXT.vocab.vectors, requires_grad=False)

Should I simply set requires_grad=True to allow the embeddings to be trained? Or should I do something like this

    word_embeddingsA = nn.Embedding.from_pretrained(TEXT.vocab.vectors, freeze=False)

Are these equivalent, and do I have a way to check that the embeddings are getting trained?

ptrblck · June 23, 2020, 3:02am

The approaches should yield the same result (if you use requires_grad=True in the first approach).
To make sure this layer is trained, you could check the gradients after the backward call via:

print(model.word_embeddings.weight.grad)

and you should see valid gradients.
If you are seeing None as the return value, the computation graph might have been detached at some point.

Jeremy_Rutman · June 23, 2020, 6:52am

Ok thanks. If I now want to infer using the trained weights can I still do

test_sen1 = TEXT.preprocess(test_sen1)
test_sen1 = [[TEXT.vocab.stoi[x] for x in test_sen1]]
test_sen1 = np.asarray(test_sen1)
test_sen1 = torch.LongTensor(test_sen1)
test_tensor1 = Variable(test_sen1, volatile=True)
output = model(test_tensor1,1)

I suppose this may still be ok as the TEXT object is just supplying the indices?

ptrblck · June 23, 2020, 8:14am

I’m not familiar enough with torchtext unfortunately.
Could you test this code snippet using some training examples and run a sanity check to see, if the predictions are expected?

PS: Variables are deprecated since PyTorch 0.4, so you can use tensors now.
To save memory, wrap the inference code in a with torch.no_grad() block.