Modifying (resizing) an embedding layer for a GPT1 model

dreidizzle · December 29, 2022, 5:46pm

I am trying to add a row for a new set of tokens I want to use for this model but I’m getting an error. Seems like I can access the data I want from state_dict and then do as below, but then it complains that the old and new dimensions are different. This is true, but can I get this to work somehow? Basically, I need to resize the Embedding layer …

I now Hugging Face has resize_token_embeddings but what about as below?

state_dict = MODEL.state_dict()
state_dict['tokens_embed.weight'] = nn.Parameter(
    torch.cat(
        (state_dict['tokens_embed.weight'], torch.zeros(1, MODEL.config.n_embd))
    ),
)
MODEL.load_state_dict(state_dict)

This is the error.

RuntimeError: Error(s) in loading state_dict for OpenAIGPTModel:
size mismatch for tokens_embed.weight: copying a param with shape torch.Size([40479, 768]) from checkpoint, the shape in current model is torch.Size([40478, 768]).