Hello everybody,
in this (NMT Tutorial) Tutorial (Section: “Training the Model”), the initial decoder_input
is defined like this:
decoder_input = torch.tensor([[SOS_token]], device=device)
This returns a tensor that looks like this:
tensor([[0]], device=‘cuda:0’) torch.Size([1, 1])
Now, after doing a forward pass in the Decoder a few lines below, the next decoder_input
is defined like this:
decoder_input = target_tensor[di]
This returns a tensor that looks like this:
tensor([129], device=‘cuda:0’) torch.Size([1])
For context, the decoder input is fed into the forward() function as inp
:
def forward(self, inp, hidden, encoder_outputs):
embedded = self.embedding(inp).view(1, 1, -1)
...
The embedding layer is defined as this:
self.embedding = nn.Embedding(self.output_size, self.hidden_size)
(Output size is vocab size of the input language).
Why does the difference in shape not matter here? If it does, what should I use and why?