DataParallel for RNN (using packing)

When I wrap nn.DataParallel around my models that contain LSTMs I keep getting runtime errors when packing my padded sequences because the sequence lengths list that I pass in is not automatically split across the batch size.

`

    autoencoder = nn.DataParallel(autoencoder, dim=0)

    ...

    # Method of autoencoder called in forward
    def encode(self, indices, lengths):
            embeddings = self.embedding(indices)
            # error happens during packing 
            packed_embeddings = pack_padded_sequence(input=embeddings,
                                             lengths=lengths,
                                             batch_first=True)

            # Encode
            packed_output, state = self.encoder(packed_embeddings)
            hidden, cell = state

`

For example, for batch size 64, the word index tensors I pass in are split across the batch dimension (32 each), but the sequence lengths list that I pass into pack_padded_sequence is still of length 64. I am using batch dimension as the first dimension (dimension 0).

I’ve seen several posts related to this issue, but haven’t found a definitive solution for this. Do I have to make the sequence lengths list a tensor? Or do I have to make the batch dimension the second dimension (dimension 1)? Any suggestions?

Related posts:



Hi, Kelly, is your problem with DataParallel and packed sequence solved? I encountered the same question of being out of range…

Maybe you can wrap the lengths list in a LongTensor variable, then use .cuda() method to split it across multiple GPUs, the in the forward() method, cast the LongTensor back to list. This works for my problem, though a bit unefficient.

hello, can you help me? i am also confused by the same problem with you in the DataParallel for RNN(using packing).Any help will be appreciated

hello, can you give me represents in details? i am also confused by the same problem with you in the DataParallel for RNN(using packing).Any help will be appreciated,

Assume (x,len) is the input to your packed RNN, (where x is a (batch_size, max_len, d_embed) tensor, len is a list of length batch_size), then you can use (x.cuda(), torch.LongTensor(len).cuda()) to distribute them on GPUs, and then in the model (which have already distributed on GPUs), use (x,list(len)) as the input. It’s a straightforward and inefficient method… Am I clear ?

1 Like

This answer might be helpful,