How to initialize hidden units differently in a single batch?

I am working on image caption and I am trying to migrate to nn.LSTM from nn.LSTMCell for speed consideration.

However when using batch size larger than one, I could not assign features extracted from images in a batch separately to their hidden states. The h0 passed to nn.LSTM, in size of layer_num x len_sequence x hidden_size, seems all the same for all sequence in a batch for different image input, which is not reasonable in this task.

Are there any solution? Thanks in advance.