I am using the library called transformers by hugging face.
And It uses transfer learning extensively. Like using bert’s pretrained model in
from_pretrained and coming across it’s fine tuning code we can save the new model weights and other hyper params in
My doubt is in there
modeling_bert code there is no explicit code that takes the pre-trained weights in acount and then trains as it generally takes attention matrices and puts it in a feed forward network
self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
And here i do not see any kind of “inheritance” of the pre-trained weights…
So is it internally handled by pytorch or am i missing something?