Reusing the embedding matrix as a sub tensor in a larger one

ac1d · May 17, 2019, 2:09pm

I am working on Language Modelling and assuming you are familiar with the topic, you already know that the first part of a Language Model is an encoder containing the Embedding Matrix. Assuming, I have an encoder:

encoder = nn.Embedding(vocab_size, emb_size)

that contains an embedding matrix:

EmbMatrix.shape = [vocab_size, emb_size]

I would like to create a tensor with dimensions:

t = [seq_length, batch_size, vocab_size, emb_size]

where for each i,j:

t[i, j] = EmbMatrix

I will then perform transformations on this tensor and use the output in a loss function. The most important thing for me is that the gradients are propagated nicely to the Embedding Matrix and the updates change both the encoder and t in the same manner, i.e. in every training iterations:

EmbMatrix == t[i, j]

Any help is appreciated! Thanks in advance

ac1d · May 17, 2019, 10:46pm

So far, I have the following:

encoder = nn.Embedding(vocab_size, emb_size)
l = []
for i in range(seq_length * batch_size):
    l.append(encoder.weight)
t = torch.stack(l).view(seq_length, batch_size, vocab_size, emb_size)

I’m not sure I can reuse the embedding matrix like this and also even if this is correct, I really feel like there has to be a better way of doing this