How to use pack_sequence if we are going to use word embedding and BiLSTM

coyote · October 28, 2018, 1:07pm

Hi everyone,

I see that there is a pack_sequence utility function used with Recurrent neural nets. There is a simple example to demonstrate usage of it. However, it does not include the word embedding usage. When I try to do the following I got an error:

import torch
import torch.nn.utils.rnn as rnn_utils
import torch.nn as nn


embeddings = nn.Embedding(6,10)
lstm = nn.LSTM(10, 200,num_layers=1, bidirectional=True)


a = torch.Tensor([[1], [2], [3]])
b = torch.Tensor([[4], [5]])
c = torch.Tensor([[6]])
packed = rnn_utils.pack_sequence([a, b, c])
emb  = embeddings(packed) # This line raises an error

Error message: TypeError: embedding(): argument ‘indices’ (position 2) must be Tensor, not PackedSequence

If I was succesful on the above line, I was going to give it as an input to lstm as below:

packed_output, (h,c) = lstm(emb)

tom · October 28, 2018, 1:42pm

A couple of months ago, following (not the cleanest, but even works for multiple args) worked:

def elementwise_apply(fn, *args):
    return torch.nn.utils.rnn.PackedSequence(fn(*[(arg.data if type(arg)==torch.nn.utils.rnn.PackedSequence else arg) for arg in args]), args[0].batch_sizes)

and then
emb = elementwise_apply(embeds, packed)

Best regards

Thomas

coyote · October 28, 2018, 2:05pm

Thank you so much for your response Thomas. If you don’t mind could you explain your function a little bit ? For example, could you tell me what is fn, what should be passed into to args etc. if they are well-known things in pytorch I am sorry for my request but I am also pretty new in pytorch. If possible a simple example demonstrating usage of this function would be awesome.

I tried following but I got an error:

import torch
import torch.nn.utils.rnn as rnn_utils
import torch.nn as nn
embeddings = nn.Embedding(6,10)
lstm = nn.LSTM(10, 200,num_layers=1, bidirectional=True)

a = torch.Tensor([[1], [2], [3]])
b = torch.Tensor([[4], [5]])
c = torch.Tensor([[6]])
packed = rnn_utils.pack_sequence([a, b, c])
emb = elementwise_apply(embeddings, packed)

return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 ‘indices’ to have scalar type Long; but got CPUFloatTensor instead (while checking arguments for embedding)

tom · October 28, 2018, 4:33pm

Fair enough!

Let’s make a simpler version:

def simple_elementwise_apply(fn, packed_sequence):
    """applies a pointwise function fn to each element in packed_sequence"""
    return torch.nn.utils.rnn.PackedSequence(fn(packed_sequence.data), packed_sequence.batch_sizes)

What this does is a) apply fn to the .data (which is where the flattened sequence elements live) and b) return a packed sequence with the result and the “bookkeeping” of .batch_sizes.
The more elaborate version above does the same, but a) takes multiple arguments b) when the arguments are packed sequences it passes the .data to fn and otherwise the full argument.

Best regards

Thomas

RylanSchaeffer · September 28, 2021, 3:59am

Is this still the standard, 3 years later? Why is working with sequences so hard?

tom · September 28, 2021, 3:25pm

Hi Rylan!

I am afraid it is.
As much as I sympathize with a utility function “apply” would not hurt:

It seems that the use of packed sequences is relatively narrow and in particular it is probably not terribly common to need to intermix the operations on packed sequences a lot with other things. For example, I don’t think HuggingFace’s transformers library - quite popular for processing sequences - is using PackedSequence at all.
Quite likely, the “figuring out what is the right thing to do” is still much more difficult than “implementing it”.
If you want, I can offer two issues with discussion, some as recent as 2020: [feature request] time-distributed layers for application of normal layers to sequence data · Issue #1927 · pytorch/pytorch · GitHub [feature request] More methods for PackedSequence · Issue #8921 · pytorch/pytorch · GitHub on the subject, some with more recent posts than on the forums.

Best regards

Thomas

RylanSchaeffer · September 28, 2021, 9:47pm

Is there a clear tutorial about how to properly call the sequence of torch.nn.utils.rnn.pad_sequence, pad_packed_sequence and pack_padded_sequence? I can’t find one in the documentation.

tom · September 29, 2021, 7:39am

There might not be…
So there are three equivalent representations here with different types:

List[Tensor] a list of sequences.
Tensor a single tensor of sequences with padding. Crucial extra information: pad value, whether it is seq first or batch first.
PackedSequence an object containing packed sequences. Contains the extra informaiton: batch sizes, indices from reordering.

So then the conversion functions all go between them, and you can just go by the type signatures to see which is appropriate: pad_sequence: 1 → 2 pad_packed_sequence 3 → 2 , pack_padded_sequence 2 → 3, pack_sequence 1 → 3.

sanket_gandhi · April 4, 2023, 10:19am

can we go from 3->1?