I am using the library called transformers by hugging face.
And It uses transfer learning extensively. Like using bert’s pretrained model in from_pretrained
and coming across it’s fine tuning code we can save the new model weights and other hyper params in save_pretrained
.
My doubt is in there modeling_bert
code there is no explicit code that takes the pre-trained weights in acount and then trains as it generally takes attention matrices and puts it in a feed forward network
self.decoder = nn.Linear(config.hidden_size,
config.vocab_size,
bias=False)
And here i do not see any kind of “inheritance” of the pre-trained weights…
So is it internally handled by pytorch or am i missing something?