Cannot move PackedSequence to GPU

I am having trouble moving a PackedSequence to my GPU after creating the PackedSequence by calling torch.nn.utils.rnn.pack_padded_sequence.

First, I create PackedSequence by running:

packed = torch.nn.utils.rnn.pad_packed_sequence(data, lengths, enforce_sorted=False, batch_first=False)

where data is a data and lengths are float and long tensors respectively & both have CPU backends. At this point, packed.is_cuda returns false.

Later I move packed onto the GPU by running:

packed =

Here, device was created using device = torch.device('cuda'). At this point, packed.is_cuda returns true. But when I pass packed to the forward function in an RNN, it gives the following error:

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 ‘index’

When I re-write my code to move data to the gpu before packing, everything works as expected. My conclusion is that either…) is broken, there’s something wrong with my cuda configuration, or this is poorly documented intended behavior.

Is this intentional? Has anyone else had this problem?

To avoid the error, I’ve switched to moving my tensors to the gpu in my dataloader collate_fn before calling pack_padded_sequence. For now, I’ve switched to 0 workers in my Dataloader, since I heard that cuda doesn’t interface with multi-threading well. This significantly slows my training code, so I’m looking for a more efficient work around if one exists.


I’ve ran into the same issue, and it feels like a bug in PackedSequence, which it looks like it’s fixed:

here, and if my reading of Github info is correct, it should be available latest in PyTorch 1.4.0 (?)