rnn = nn.LSTM(10, 20, 2) input = Variable(torch.randn(5, 3, 10)) output, hn = rnn(input)
In the above example, input for
nn.LSTM should be 3 dimensional tensor. However, for example image captioning and machine translation, at the evaluation step, LSTM’s input can not be predetermined. During the evaluation phase, LSTM should takes 2D tensor, generates output and feedback it to the next time step.
How can i deal with this problem?
nn.LSTM in my image captioning model.