Inserting custom layer in a pretrained model

Vibhu · September 29, 2022, 5:28am

Hi,
I am looking for a way to slightly modify Hugging Face GPT-2’s architecture by inserting a custom feedforward layer inside a GPT-2 decoder block, right after the masked self-attention sublayer. I want to then initialize all original parameters with pre-trained GPT-2 weights and the newly added ones randomly. Is there a way to achieve this by inheriting Hugging Face’s GPT-2 model, instead of copying Hugging Face’s modeling_gpt2 file and then making changes to it?
I’d be really grateful if someone could guide me or point me in the right direction.

Thanks

srishti-git1110 · September 29, 2022, 5:40am

Does this help?
It’s not about GPT-2, but maybe you could replicate this for your use case.