Fine-tuning only new weights

sl5035 · August 17, 2023, 8:37am

I am trying to train a transformer model after extending its vocabulary. The problem is I want to keep the original weights frozen and train only the weights associated with the new vocabulary. I was thinking of doing something like this:

processor = processor() # Loading processor
model = model() # Loading model
for param in model.parameters():
    param.requires_grad = False

processor.tokenizer.add_tokens(new_tokens)
model.resize_token_embeddings(len(processor.tokenizer))

Is this a valid option? If not, what other options do I have?

ptrblck · August 17, 2023, 2:02pm

It’s unclear what exactly happens in your code as neither object definitions are posted nor did you explain what the custom methods perform.
If you want to freeze a weight parameter partially, you could zero out its gradients before calling optimizer.step() (which should work for stateless optimizers) or you could restore the original weight values after the update (which would also work for optimizers using running stats).