I run into a problem when I dynamically try to add new word embeddings during runtime. I’m doing word sense induction where there might be a random number of new embedding rows per batch that needs to be added to the already existing embeddings. I currently try to add embeddings using the following code (part of a minimal working example):
# Create some extra embeddings and concatenate those to the original data
new_embeddings = t.FloatTensor(n, self.embeds.weight.shape[1]).normal_()
self.embeds.weight.data = t.cat([self.embeds.weight.data, new_embeddings.data])
# Also set the num_embeddings variable of the embedding layer
self.embeds.num_embeddings = self.embeds.weight.shape[0]
This code works for adding them and the embeddings are included afterwards.
The forward layer is able to select these new embeddings without problems.
The problem occurs when the running backwards pass, then I receive the following error:
RuntimeError: Function EmbeddingBackward returned an invalid gradient at index 0 - expected shape [100, 25] but got [110, 25]
However, I am not able to find where the original shape of the embeddings is stored for this pass. As far as I can tell, the new size is returned everywhere except here.
Am I missing a variable I need to set or is this a problem specific to the Embedding class?
Full minimal working example:
import torch as t
class Model(t.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.embeds = t.nn.Embedding(100, 25, 0)
self.linear = t.nn.Linear(25, 10)
self.loss = t.nn.CrossEntropyLoss()
def add_embedding(self, n=10):
# Create some extra embeddings and concatenate those to the original data
new_embeddings = t.FloatTensor(n, self.embeds.weight.shape[1]).normal_()
self.embeds.weight.data = t.cat([self.embeds.weight.data, new_embeddings.data])
# Also set the num_embeddings variable of the embedding layer
self.embeds.num_embeddings = self.embeds.weight.shape[0]
def forward(self, batch_size=50):
indices = t.randint(0,self.embeds.weight.shape[0],(batch_size,)).long()
e = self.embeds(indices)
l = self.linear(e)
return self.loss(l, t.zeros(batch_size).long())
model = Model()
optimizer = t.optim.SGD(model.parameters(), lr=.1)
for i in range(10):
optimizer.zero_grad()
l = model.forward()
l.backward()
optimizer.step()
# Embedding size is currently:
# model.embeds -> Embedding(100, 25, padding_idx=0)
# model.embeds.weight.data.shape -> torch.Size([100, 25])
# [l for l in model.embeds.parameters()][0].shape -> torch.Size([100, 25])
model.add_embedding()
# After adding the new embeddings, these are the sizes:
# model.embeds -> Embedding(110, 25, padding_idx=0)
# model.embeds.weight.data.shape -> torch.Size([110, 25])
# [l for l in model.embeds.parameters()][0].shape -> torch.Size([110, 25])
optimizer.zero_grad()
l = model.forward()
l.backward() # This is the line that crashes
# Crashes with error
""" RuntimeError: Function EmbeddingBackward returned an invalid gradient at index 0
- expected shape [100, 25] but got [110, 25]
"""
Thanks up front!