How to handle variable length of sequences when using RNNCell

ChangGao · July 22, 2020, 6:17am

Hi! I am considering using the RNNCell to handle a sequence task. I know that batch processing with padding and packing can accelerate the calculation. However, it is not feasible in my task because the data needs preprocessing at each time step.

Now I am considering to output the hidden state of the RNN at each time step using RNNCell. But I couldn’t figure out how to handle the variable length of sequences. I don’t know if masking the output would work.

Has anbody had this problem? Thanks for your help!

meng_lin · July 23, 2020, 1:44pm

You can find the answer to this question by reading any work that requires teacher forcing on github. It may not be the most efficient solution but it works.

github.com

poojahira/image-captioning-with-attention/blob/master/models.py

import torch
from torch import nn
import torchvision

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


class Encoder(nn.Module):
    """
    Encoder.
    """

    def __init__(self, encoded_image_size=14):
        super(Encoder, self).__init__()
        self.enc_image_size = encoded_image_size

        resnet = torchvision.models.resnet101(pretrained=True)  # pretrained ImageNet ResNet-101

        # Remove linear and pool layers (since we're not doing classification)
        modules = list(resnet.children())[:-2]

This file has been truncated. show original

Here is one example, just read the forward function for DecoderWithAttention class.