Torch.nn.utils.rnn.pack_padded_sequence for CNNs

tjoseph · March 21, 2018, 9:39am

Hello,
I am wondering what is the current best practice for variable length image sequences and CNNs.
I want to encode my variable-length image sequences using a CNN and then feed them into a RNN.
To enable batch-processing I am padding my image sequences with zero-frames. Then I just stack the sequence along the batch dimension and feed into the CNN like this:

batch_size, sequence_length, channels, height, width = image_sequence.size()

# CREATE EMBEDDING
# Transform BxSx... to (B*S)x... so it can be feed into the CNN
image_sequence = image_sequence.view(-1, 3, height, width)
embeds = self.cnn(image_sequence)
# Transform back from (B*S)x... to BxSx...
embeds = embeds.view(batch_size, sequence_length, -1)

Now for RNNs http://pytorch.org/docs/master/nn.html#torch.nn.utils.rnn.pack_padded_sequence exists, but my CNN still has to be run for a lot of padding-frames. Is there something like pack_padded_sequence for CNNs?