I am working on image caption and I am trying to migrate to nn.LSTM
from nn.LSTMCell
for speed consideration.
However when using batch size larger than one, I could not assign features extracted from images in a batch separately to their hidden states. The h0
passed to nn.LSTM
, in size of layer_num x len_sequence x hidden_size
, seems all the same for all sequence in a batch for different image input, which is not reasonable in this task.
Are there any solution? Thanks in advance.