I am working on Language Modelling and assuming you are familiar with the topic, you already know that the first part of a Language Model is an encoder containing the Embedding Matrix. Assuming, I have an encoder:
encoder = nn.Embedding(vocab_size, emb_size)
that contains an embedding matrix:
EmbMatrix.shape = [vocab_size, emb_size]
I would like to create a tensor with dimensions:
t = [seq_length, batch_size, vocab_size, emb_size]
where for each i,j:
t[i, j] = EmbMatrix
I will then perform transformations on this tensor and use the output in a loss function. The most important thing for me is that the gradients are propagated nicely to the Embedding Matrix and the updates change both the encoder and t in the same manner, i.e. in every training iterations:
EmbMatrix == t[i, j]
Any help is appreciated! Thanks in advance