Inserting custom layer in a pretrained model

I am looking for a way to slightly modify Hugging Face GPT-2’s architecture by inserting a custom feedforward layer inside a GPT-2 decoder block, right after the masked self-attention sublayer. I want to then initialize all original parameters with pre-trained GPT-2 weights and the newly added ones randomly. Is there a way to achieve this by inheriting Hugging Face’s GPT-2 model, instead of copying Hugging Face’s modeling_gpt2 file and then making changes to it?
I’d be really grateful if someone could guide me or point me in the right direction.


Does this help?
It’s not about GPT-2, but maybe you could replicate this for your use case.

1 Like