Add embeddings during runtime

I run into a problem when I dynamically try to add new word embeddings during runtime. I’m doing word sense induction where there might be a random number of new embedding rows per batch that needs to be added to the already existing embeddings. I currently try to add embeddings using the following code (part of a minimal working example):

        # Create some extra embeddings and concatenate those to the original data
        new_embeddings = t.FloatTensor(n, self.embeds.weight.shape[1]).normal_()
        self.embeds.weight.data = t.cat([self.embeds.weight.data, new_embeddings.data])
        # Also set the  num_embeddings variable of the embedding layer
        self.embeds.num_embeddings = self.embeds.weight.shape[0]

This code works for adding them and the embeddings are included afterwards.
The forward layer is able to select these new embeddings without problems.
The problem occurs when the running backwards pass, then I receive the following error:

RuntimeError: Function EmbeddingBackward returned an invalid gradient at index 0 - expected shape [100, 25] but got [110, 25] 

However, I am not able to find where the original shape of the embeddings is stored for this pass. As far as I can tell, the new size is returned everywhere except here.

Am I missing a variable I need to set or is this a problem specific to the Embedding class?

Full minimal working example:

import torch as t

class Model(t.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.embeds = t.nn.Embedding(100, 25, 0)
        self.linear = t.nn.Linear(25, 10)
        self.loss = t.nn.CrossEntropyLoss()

    def add_embedding(self, n=10):
        # Create some extra embeddings and concatenate those to the original data
        new_embeddings = t.FloatTensor(n, self.embeds.weight.shape[1]).normal_()
        self.embeds.weight.data = t.cat([self.embeds.weight.data, new_embeddings.data])
        # Also set the  num_embeddings variable of the embedding layer
        self.embeds.num_embeddings = self.embeds.weight.shape[0]
        
    def forward(self, batch_size=50):
        indices = t.randint(0,self.embeds.weight.shape[0],(batch_size,)).long()
        e = self.embeds(indices)
        l = self.linear(e)
        
        return self.loss(l, t.zeros(batch_size).long())
        
model = Model()
optimizer = t.optim.SGD(model.parameters(), lr=.1)

for i in range(10):
    optimizer.zero_grad()
    l = model.forward()
    l.backward()
    optimizer.step()

# Embedding size is currently:
# model.embeds                                      ->       Embedding(100, 25, padding_idx=0)
# model.embeds.weight.data.shape                    ->       torch.Size([100, 25])
# [l for l in model.embeds.parameters()][0].shape   ->       torch.Size([100, 25])

model.add_embedding()
# After adding the new embeddings, these are the sizes:
# model.embeds                                      ->       Embedding(110, 25, padding_idx=0)
# model.embeds.weight.data.shape                    ->       torch.Size([110, 25])
# [l for l in model.embeds.parameters()][0].shape   ->       torch.Size([110, 25])


optimizer.zero_grad()
l = model.forward()
l.backward()   # This is the line that crashes

# Crashes with error 
""" RuntimeError: Function EmbeddingBackward returned an invalid gradient at index 0 
- expected shape [100, 25] but got [110, 25] 
""" 

Thanks up front!

Replying to my own question:

Adjusting the values of the embedding layer does not seem to work, so I tried reinstating it instead.
The method below does function properly, with small caveats.

def add_embedding(self, n=10):
        # Create some extra embeddings and concatenate those to the original data
        new_embeddings = t.FloatTensor(n, self.embeds.weight.shape[1]).normal_()
        e = t.cat([self.embeds.weight.data, new_embeddings])
        self.embeds = t.nn.Embedding(e.shape[0], e.shape[1])
        self.embeds.weight.data = e

The problem of this is that the original parameter no longer exists, so the optimizer is no longer able to change any of the embedding weights. This means a new optimizer needs to be created as part of the training loop, so this is only possible using SGD without momentum.

Example of a simple training loop:

for i in range(100):
    # New optimizer which points towards the latest versions of the embeddings
    optimizer = t.optim.SGD(model.parameters(), lr=10) 
    optimizer.zero_grad()
    l = model.forward()
    l.backward()
    optimizer.step()
    model.add_embedding()

Still, if anybody knows of a way to modify the original embeddings and keep the original optimizer I would gladly know.