I am here because I am struggling with this problem: how to best fine-tune a pretrained language model such as BERT, BART, RoBERTA, and so on, but with architectural or data flow customization.
I would like to have your opinions if you have experience creating a kind discussion on that topic.
I try to better explain the problem. I have a pretrained model called BART that is a model for summarization (and text generation). I want to alter its inner structure in different ways to study its behavior:
- Transforming the data flow without changing the weights. Full match with the checkpoint. I guess the weights now should be fine-tuned to work with this new data flow. For example, I want the encoder to process more inputs before calling the decoder instead of calling it after each input.
- Adding new custom layers with new weights to train. For example, I want to add a linear projection (
nn.Linear()) after the encoder. In this case, I have a new matrix of fresh weights that needs to be trained from scratch.
- Finally, I can alter data flow and add new custom layers at the same time.
In all of these scenarios, I need an efficient strategy to fine-tune the model. If the learning rate is too high or small model does not converge.
I thought it could be useful to use a different learning rate for the custom layer, maybe a different scheduler, or just freeze some layers. The possibilities are almost infinite.
Do you have any experience to share on these topics? It could be helpful for me and everyone in this community
Thank you. Have a nice day