Do we need to set a fixed input sentence length when we use padding-packing with RNN?

scarecrow21 · October 2, 2019, 11:06am

Do we need to define a fixed sentence length when we’re using padding-packing for RNNs? I just developed a small RNN for text classification and realized (after successfully training+testing) that I haven’t specified a sentence length for the number of input neurons to the RNN. Pytorch did not give any errors.

Am I doing something wrong here? Don’t we have to specify a sentence length in order to define the input length for the RNN?

Please have a look at the following snippets of code I used. Input to the dataloader is a list of variable sized tensors. Eg. [[1, 2, 3], [4, 5]]

class SampleData(Dataset):

    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)

sample_data = SampleData(X_train, y_train)
sample_loader = DataLoader(sample_data, batch_size=BATCH_SIZE, collate_fn=lambda x:x)

# X_batch
# [[421, 287, 2480, 1961], [399, 2269, 891, 2355, 353, 406, 1310]] 

# y_batch
# [1, 1] 



BATCH_SIZE = 2
EMBEDDING_SIZE = 5
VOCAB_SIZE = len(word2idx)
TARGET_SIZE = len(tag2idx)
HIDDEN_SIZE_SAMPLE = 3
STACKED_LAYERS = 4


class ModelGRU(nn.Module):
    
    def __init__(self, embedding_size, vocab_size, hidden_size, target_size, stacked_layers):
        super(ModelGRU, self).__init__()
        
        self.word_embeddings = nn.Embedding(num_embeddings = vocab_size, embedding_dim = embedding_size)
        self.gru = nn.GRU(input_size = embedding_size, hidden_size = hidden_size, batch_first = True, num_layers=stacked_layers)
        self.linear = nn.Linear(in_features = hidden_size, out_features=1)
        

    def forward(self, x_batch):

        len_list = list(map(len, x_batch))
        padded_batch = pad_sequence(x_batch, batch_first=True)
        embeds = self.word_embeddings(padded_batch)

        pack_embeds = pack_padded_sequence(embeds, lengths=len_list, batch_first=True, enforce_sorted=False)
        
        rnn_out, rnn_hidden = self.gru(pack_embeds)
        linear_out = self.linear(rnn_hidden)        
        y_out = torch.sigmoid(linear_out)
        y_out = y_out[-1]

        
        return y_out


gru_model = ModelGRUSample(embedding_size=EMBEDDING_SIZE, vocab_size=len(word2idx), hidden_size=HIDDEN_SIZE, target_size=len(tag2idx), stacked_layers=STACKED_LAYERS)

# ModelGRUSample(
#   (word_embeddings): Embedding(2728, 5)
#   (gru): GRU(5, 3, num_layers=4, batch_first=True)
#   (linear): Linear(in_features=3, out_features=1, bias=True)
# )

Everything run perfectly fine. But have a look at the following picture. A sentence with different number of words will lead to having different number of neurons as input to the RNN. But by using padding-packing, I have apparently bypassed that step somehow.

Please tell me what I’m doing wrong. I can’t seem to figure it out.

vdw · October 3, 2019, 8:09am

The sequence length has nothing to do with the number of neurons. The size is all set here:

self.gru = nn.GRU(input_size = embedding_size, hidden_size = hidden_size, batch_first = True, num_layers=stacked_layers)

Your figure shows an “unrolled” RNN, but this is just for visualization behind the recursive nature behind RNNs. Each blue box labeled LSTM is the exact same network.

scarecrow21 · October 3, 2019, 5:08pm

So, the input is my embedding layer right? And embedding is simply a lookup created for each word in a sentence.

Let’s say my vocab size = 10 and I define my embedding dim = 5. So, each word in my vocab is represented as a 1x5 vector. Now, this vector in input into the RNN.

What I cannot seem to wrap my around is that if I have two sentence of lengths 6 words and 8 words, then I will send vectors of length 6 and 8 to the RNN each of which is a 1x5 vector. But I still have to decide on the input length right?

In PyTorch RNN, seq_len is the length of our input that we want to consider, right? In this case, if seq_len is not deciding the number of neurons, then what is?

vdw · October 4, 2019, 8:14am

The RNN see each word, i.e., a vector of size 5, step by step. If there are 6 words, the RNN sees 6 vectors and then stops. Same with 8 words. Your confusion might stem that LSTM or GRU hides this step-wise processing. You give the model a sequence of a certain lengths, but internally the model loops over the sequence. More words just means more loops before it’s finished.

Obviously, things get problematic with batches if the sequences in a batch have different lengths. One default solutions is to pad all short sequences to the length of the longest sequence.

The size/complexity of the model (the number of neurons of you will, but it’s better to think in number of trainable parameters) of the LSTM/GRU depends on:

the size of the input (e.g., 5 in your example)
the size of the hidden dimension
number of layers in case of a stacked LSTM/GRU
whether you use uni- or bidirectional.

It does not depend on the sequences lengths. Sure, the processing takes more time for longer sequences.