I am at the half of the official seq2seq tutorial: http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html .
I successfully trained no-attention network but I had to make one fix I would like to ask about.
class EncoderRNN(nn.Module): def forward(self, input, hidden): embedded = self.embedding(input).view(1, 1, -1) # **What is .view(1, 1, -1) for?** output = embedded for i in range(self.n_layers): output, hidden = self.gru(output, hidden) return output, hidden
As I understand proper tensor sizes are:
input # [sequenceLength, 1]
hidden # [1, 1, hiddenSize]
embedded = self.embedding(input) # [sequenceLength, 1, hiddenSize]
output = embedded.view(1, 1, -1) # Everything works without .view(1, 1, -1)
output, hidden = self.gru(output, hidden) # where gru expects input tensor to have size [sequenceLength, batch=1, hiddenSize] which means .view(1, 1, -1) would make sequenceLength=1 and hiddenSize = sequenceLength * hiddenSize
Can someone explain how is it supposed to work with .view(1, 1, -1)?