I am working on Language Modelling and assuming you are familiar with the topic, you already know that the first part of a Language Model is an **encoder** containing the **Embedding Matrix**. Assuming, I have an encoder:

`encoder = nn.Embedding(vocab_size, emb_size)`

that contains an embedding matrix:

`EmbMatrix.shape = [vocab_size, emb_size]`

I would like to create a tensor with dimensions:

`t = [seq_length, batch_size, vocab_size, emb_size]`

where for each i,j:

`t[i, j] = EmbMatrix`

I will then perform transformations on this tensor and use the output in a loss function. The most important thing for me is that the gradients are propagated nicely to the Embedding Matrix and the updates change both the **encoder** and **t** in the same manner, i.e. in every training iterations:

`EmbMatrix == t[i, j]`

Any help is appreciated! Thanks in advance