Hi. I’m currently working on a personal reimplementation of the Transformer paper and had a question.

On page 5 in section “3.4 Embeddings and Softmax,” it states:

In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation.

I’ve currently implemented my model to use just one embedding layer for both source and target tensors, but I’m wondering if there would be a way that I could use the weights of the embedding layer as a linear layer. What I’ve currently done is something like:

```
output = previous_layer(previous_input)
final_output = torch.matmul(output, embedding_layer.embedding.weight.transpose(1, 0))
```

I’ve transposed the weight matrix before matrix multiplication because it’s of shape `(vocab_size, embedding_dim)`

and the shape of `output`

is `(batch_size, seq_len, embedding_dim)`

. Is this the proper way to use an embedding layer as a linear layer? If not, I’d like some tips on what I should be doing.

Thanks.