From the documentation, here is the input and output shape for a nn.Embedding layer: Input: LongTensor (N, W), N = mini-batch, W = number of indices to extract per mini-batch Output: (N, W, embedding_dim)
And for a RNN:
input (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() for details.
h_0 (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.```
If I'm understanding this correctly, then the output of an embedding layer needs to have its first 2 dimension transposed in order make N, the batch size, be the second dimension; that is, W, N, embedding dimension. Am I understanding this correctly?