# Why does the tensor dimension not seem to matter here?

Hello everybody,

in this (NMT Tutorial) Tutorial (Section: “Training the Model”), the initial `decoder_input` is defined like this:

decoder_input = torch.tensor([[SOS_token]], device=device)

This returns a tensor that looks like this:

tensor([[0]], device=‘cuda:0’) torch.Size([1, 1])

Now, after doing a forward pass in the Decoder a few lines below, the next `decoder_input` is defined like this:

decoder_input = target_tensor[di]

This returns a tensor that looks like this:

tensor([129], device=‘cuda:0’) torch.Size([1])

For context, the decoder input is fed into the forward() function as `inp`:

``````    def forward(self, inp, hidden, encoder_outputs):
embedded = self.embedding(inp).view(1, 1, -1)
...
``````

The embedding layer is defined as this:

`self.embedding = nn.Embedding(self.output_size, self.hidden_size)` (Output size is vocab size of the input language).

Why does the difference in shape not matter here? If it does, what should I use and why?

Hi,
In the simplest terms, `nn.Embedding(4, 3)` will act as a look-up table for 3-dimensional vectors corresponding to 4 indices 0, 1, 2, 3.

It matters inasmuch as it determines the output shape, like so:

``````x = torch.tensor([1, 2, 0, 3])  # a single vector, 1 dimensional
y = torch.tensor([[1, 2, 0, 3]]) # batch_size * sequence length
emb = nn.Embedding(4, 3)
x_emb = emb(x)
y_emb = emb(y)
print(x_emb, y_emb)
``````

gives:

``````(tensor([[ 0.4080,  1.3991,  1.1883],
[-0.3503,  0.1206,  0.2660],
[ 0.8378, -0.3656,  1.6117],
tensor([[[ 0.4080,  1.3991,  1.1883],
[-0.3503,  0.1206,  0.2660],
[ 0.8378, -0.3656,  1.6117],
If I understand you right, then this would mean, that `y` in your example corresponds to a batch size of 1.
And since the output of the embedding layer will be reshaped using (`.view(1,1,-1`), the final output of the embedding layer (`embedd`) will always be the same, regardless of the extra dimension for batch size.