Understanding RNN and sequences with images

Hi, i’m having some trouble understanding how to use CNN+RNN for a 2-class segmentation problem.

I already have a CNN which produces a Bx2xHxW output and want to reuse it with an RNN (maybe a GRU) to produce another output that depends also from previously seen frames.

In other words i was thinking about storing the CNN outputs into a sequence so that i could feed it to the GRU and get my actual output.

What i don’t understand is how to use torch.nn.utils.rnn.pack_padded_sequence to build my sequence. Can anyone give an example and explain what it does?

Also i don’t understand how to handle mini-batches. Should each element of a mini-batch refer to a different sequence?