PackedSequence with DataLoader

Hey,

I’m trying to reproduce some previous work I did with Theano in PyTorch, with RNNs.

I want to be able to mask the sequences I pass as input to an RNN based model. This should be easy enough…

However, there’s a couple of annoying issues that are bugging me:

  1. PackedSequences inputs are only supported by RNNs, which means that I have to constantly use pack_padded_sequences and pad_packed_sequences constantly, back and forth, in order to have a model with RNN layers that interact with other types of layers.

  2. The requirement of pack_padded_sequences of having a sorted sequence length list. This is pretty much the same as bucketing and it is conditioning training by not allowing random sapling. By having this requirement, how can I combine a DataLoader with PackedSequence without having to fully sort my dataset by length?

And finally, one last question:
Is it possible to mask the loss function natively in PyTorch?

Cheers!

2 Likes
  1. Yes, for now you have to constantly pack and unpack if you’re mixing RNNs and conv layers. If you’re mixing RNNs and fully connected layers, you don’t have to unpack at all – you can actually call the linear layer directly on the packed sequence.
  2. The list just has to be sorted within an individual batch, so you can still shuffle your dataset and randomly sample a batch at a time, then sort each batch to send it to pack_padded_sequence.

What’s the best way to sort / rearrange tensors in PyTorch?

4 Likes

is this correct usage?

import torch
x = torch.autograd.Variable(torch.randn(5, 3, 2))
l = torch.nn.Linear(2, 2)
r = torch.nn.RNN(2, 2)

px = torch.nn.utils.rnn.pack_padded_sequence(x, [5, 3, 1])
ph = torch.nn.utils.rnn.PackedSequence(l(px.data), px.batch_sizes)
py, h = r(ph)

Yes, that looks right to me.

If you’re mixing RNNs and fully connected layers, you don’t have to unpack at all – you can actually call the linear layer directly on the packed sequence.

Is this true? It is not working for me since nn.Linear requires attribute ‘dim’.

2 Likes

I’m observing the same behavior. I don’t think linear layers can take packed sequences as inputs.

I suppose you can do it by passing a custom collate_fn to the DataLoader.

1 Like

It seems that the trick is to use .data when invoke the nn.linear() as shown in @ShigekiKarita’s code.

From the documentation of torch.nn.utils.rnn.PackedSequence:

Instances of this class should never be created manually. They are meant to be instantiated by functions like pack_padded_sequence().

Is there possible any better way now to apply the packed sequence with the linear layers?