I am working on image caption and I am trying to migrate to
nn.LSTMCell for speed consideration.
However when using batch size larger than one, I could not assign features extracted from images in a batch separately to their hidden states. The
h0 passed to
nn.LSTM, in size of
layer_num x len_sequence x hidden_size, seems all the same for all sequence in a batch for different image input, which is not reasonable in this task.
Are there any solution? Thanks in advance.