I have been working with torchtext word embeddings (ex. GloVe) and now I would like to experiment with sentence embeddings.
I am using FAIR LASER: https://pypi.org/project/laserembeddings/
With Word embeddings, I would carry out the following:
Load Embedding Layer with Embedding.from_pretrained()
Feed to Linear Layer, LSTM and some other linear layers.
I have simply replaced the word embeddings with the FAIR sentence embeddings, but the network is not learning. Is this an incorrect way to approach this?
When you “freeze” a model (or part of the model, or some layers in the model), you effectively disable learning. You set requires_grad=False meaning that no gradients will be calculate for that part in the model - so the model will not learn, i.e. the gradients will not be calculated and the optimizer will not update the weights. If you want to make those pretrained models “trainable” (or "fine-tune-able), you have to disable freezing.