Accessing `.data` in PackedSequence when `batch_first=True`

When creating a variable-length PackedSequence with batch_first=True, accessing the .data attribute returns the sequences out of order, as if batch_first=False.

I don’t really understand the reasoning behind having the sequence be the first dimension by default (it seems less intuitive given how pytorch otherwise deals with batches), but I’m assuming it is for performance reasons. Even then, given that the .data attribute is public-facing, I feel like it should be returned in the same order as it was given. Then, for those of us writing modules that use padded and packed sequences, we can more naturally deal with this input without hacking together even more re-ordering code; requiring pack_padded_sequence() and pack_sequence() to receive sequences in decreasing length is enough of a hassle, but that’s another topic.

I wasn’t sure if this behavior was intended or not, so I’m posting here rather than making a bug report. But is this behavior correct? If so, why, and how does pytorch recommend dealing with this issue?

Code to Reproduce Behavior:

import torch
from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence

batch_first_seqs = [ \
    torch.rand((3, 2)),
    torch.rand((2, 2)),
    torch.rand((1, 2))]
lengths = torch.LongTensor([3, 2, 1])
padded_seqs = pad_sequence(batch_first_seqs, batch_first=True)
packed_seqs = pack_padded_sequence(padded_seqs, lengths=lengths, batch_first=True)

print( ==


[tensor([[0.7967, 0.5329],
        [0.6376, 0.3543],
        [0.6514, 0.8007]]), tensor([[0.1709, 0.1577],
        [0.5007, 0.8083]]), tensor([[0.3345, 0.7590]])]

tensor([[[0.7967, 0.5329],
         [0.6376, 0.3543],
         [0.6514, 0.8007]],

        [[0.1709, 0.1577],
         [0.5007, 0.8083],
         [0.0000, 0.0000]],

        [[0.3345, 0.7590],
         [0.0000, 0.0000],
         [0.0000, 0.0000]]])

PackedSequence(data=tensor([[0.7967, 0.5329],
        [0.1709, 0.1577],
        [0.3345, 0.7590],
        [0.6376, 0.3543],
        [0.5007, 0.8083],
        [0.6514, 0.8007]]), batch_sizes=tensor([3, 2, 1]))

tensor([[1, 1],
        [0, 0],
        [0, 0],
        [0, 0],
        [1, 1],
        [0, 0]], dtype=torch.uint8)


The .data field is kept for backward compatibility but should not be used at all.
Why do you need it? You should replace all use of it with either .detach() to break the graph or with torch.no_grad() to perform ops that are not tracked by the autograd engine.

Thanks for responding. I was using this as a bit of a hack to remove the padding from a padded sequence. Basically, I wanted to to let a function parameter specify whether an input was padded or not, then break up that padded sequence into a list of variable-length sequences for use in a stateful layer (like an LSTM, or a GRU). And of course, I was wanting to keep the batch dimension as the first dimension.

Knowing that the .data field shouldn’t be used at all, it makes sense that this isn’t the pytorch-intended way of doing this. In fact, it seems likely that the library would rather this not be done at all. It seems unavoidable when writing custom state-based layers, so I suppose I’ll just write my own utility functions to help clean things up.

Here’s hoping 1.0 cleans up this part of the library!

Hi @albanD,

Can you please elaborate as to why we should not use the .data attribute at all? I have encountered a scenario where using this attribute is the only way I can effectively pass data through my network, similar to How to process variable length sequence of images with CNN - #8 by 3nomis. In that discussion, it seems like we can use .data as a workaround for certain scenarios where it is needed. Is that approach actually valid? Do we actually need to use .detach() for each of these references to .data? I would much appreciate any further insights you may have regarding this.

@albanD made most likely the same mistake as I did here and confused the attribute with the deprecated one.

1 Like