How to pass to Linear data with variable length?

I am trying to do sequence classification by first passing data to RNN and then to Linear, normally I would just reshape the output from [batch_size, sequence_size, hidden_size] to [batch_size, sequence_size*hidden_size] to pass it to Linear, but in this case I have sequence of varying lengths, so the output of RNN might be for example [batch_size, 32, hidden_size] or [batch_size, 29, hidden_size], so I don’t know with what shape to initialize the Linear layer (in place of question marks in the code below). Is it at all possible?

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes=4):
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size*????, num_classes)

    def forward(self, x):
        #x=[batch_size, sequence_length]
        out, h_n = self.rnn(x)  #out=[batch_size, sequence_length, hidden_size]
        out = torch.reshape(out, (BATCH_SIZE, -1)) #out=[batch_size, sequence_length*hidden_size]
        out = self.fc(out)  #out=[batch_size, num_classes]
        return x
    ```


Currently each batch is padded to the longest sequence in the batch, is it better to just pad all the sequences to the same length to get rid of this problem? Is changing shape of input to Linear causing some bad side effects?

This really depends on the specific task you’re trying to solve, I can’t think of any direct way to do this. Padding seems like one approach. Another approach would be to take just the last N elements from your RNN output and passing those to your Linear layer (where N is fixed, and selected in a reasonable way such that it has a chance to encode all the relevant information).

Thank you for your reply. I am trying to solve a task of classifying sentences. I think it is possible to do dynamic batches cause BERT model is doing that, but I am not sure how to adjust layers to do that. See the below:

Haven’t personally trained it, perhaps someone else has and can give a direct answer.

However, have you checked how other BERT PyTorch implementations deal with this, e.g. this one with 5k stars?

Seeing this:

# paper noted they used 4*hidden_size for ff_network_hidden_size
self.feed_forward_hidden = hidden * 4
...
# multi-layers transformer blocks, deep network
self.transformer_blocks = nn.ModuleList(
    [TransformerBlock(hidden, attn_heads, hidden * 4, dropout) for _ in range(n_layers)])

So they’re using a vanilla linear layer with an input size of 4 * hidden, might be instructive to look at their code and make sure you’re on the same page regarding the dimension of tensors being passed through the model. Unless, of course, you’re trying a different type of implementation!

Good luck!

1 Like