Does Pre-Training Work Internally in pytorch?

I am using the library called transformers by hugging face.


And It uses transfer learning extensively. Like using bert’s pretrained model in from_pretrained and coming across it’s fine tuning code we can save the new model weights and other hyper params in save_pretrained.
My doubt is in there modeling_bert code there is no explicit code that takes the pre-trained weights in acount and then trains as it generally takes attention matrices and puts it in a feed forward network

self.decoder = nn.Linear(config.hidden_size,
                                 config.vocab_size,
                                 bias=False)

And here i do not see any kind of “inheritance” of the pre-trained weights…
So is it internally handled by pytorch or am i missing something?

Could you link to the functions you have questions about, as the repository is quite large, which makes it hard to follow your questions.
Generally, I think the Huggingface devs might have a faster and better answer, so maybe @Thomas_Wolf finds some time to answer your question. :wink: