How to perform repeat padding for variable length data?

hadaev8 · April 1, 2020, 1:20pm

I have variable length data and want to pack it to batches with size of max sample len in batch repeating shorter samples.
For example like this
[[0, 1, 2, 3, 4], [0, 1, 2]] => [[0, 1, 2, 3, 4], [0, 1, 2, 0, 1]]

ptrblck · April 2, 2020, 2:06am

You could use some rnn util functions:

x = [torch.tensor([0, 1, 2, 3, 4]), torch.tensor([0, 1, 2])]
x = torch.nn.utils.rnn.pack_sequence(x)
out = torch.nn.utils.rnn.pad_packed_sequence(x, batch_first=True)
print(out)
> (tensor([[0, 1, 2, 3, 4],
        [0, 1, 2, 0, 0]]), tensor([5, 3]))

Where the first return value will be the padded tensor, while the second will give you the lengths before padding.

hadaev8 · April 2, 2020, 10:53am

But this is zero padding, not repeat padding.

ptrblck · April 2, 2020, 10:05pm

Oh sorry, I apparently missed the most important part of the question.
I’m not sure, if there is a function for this, but this code snippet should work:

x = [torch.tensor([0, 1, 2, 3, 4]), torch.tensor([0, 1, 2])]
max_len = max([t.size(0) for t in x])
res = [torch.cat((t, t[:max_len-t.size(0)])) for t in x]

hadaev8 · April 9, 2020, 2:07pm

With repeat padding my attention even worse since where is now non zero data in padding.

Should you advise how to perform padding in query in multi head attention?
I tried to make mask, but got nan in softmax.

ptrblck · April 9, 2020, 10:38pm

Could you explain your use case a bit more regarding the NaN output in softmax?

hadaev8 · April 10, 2020, 12:18am

Well, i wanted to mask attentions also by query axis.
But with default attn_mask setup it case nans.
Google say, its because only -infs in axis.

Now i edited source code of multi head attention forward like this:

if attn_mask is not None:
    attn_output_weights = attn_output_weights.view(bsz, num_heads, tgt_len, src_len)
    attn_output_weights = attn_output_weights.masked_fill(attn_mask, 1e-9)
    attn_output_weights = attn_output_weights.view(bsz * num_heads, tgt_len, src_len)

attn_output_weights = softmax(
    attn_output_weights, dim=-1)

And made masks like this

def get_mask_from_lengths_3d(batch_size, lengths_query, lengths_key, nheads):
    mask = torch.zeros(batch_size, lengths_key.max(),
                    lengths_query.max()).cuda()

    max_len = torch.max(lengths_key).item()
    ids = torch.arange(0, max_len, out=torch.cuda.LongTensor(max_len))
    mask[ids > lengths_key.unsqueeze(1) - 1] = 1

    mask = mask.transpose(1, 2)

    max_len = torch.max(lengths_query).item()
    ids = torch.arange(0, max_len, out=torch.cuda.LongTensor(max_len))
    mask[ids > lengths_query.unsqueeze(1) - 1] = 1

    return mask.unsqueeze(1).repeat(1, nheads, 1, 1).bool()


def generate_square_subsequent_mask_3d(batch_size, lengths_query, nheads):
    sz = lengths_query.max().item()
    mask = torch.triu(torch.ones(sz, sz), 1).cuda(
    ).unsqueeze(0).repeat(batch_size, 1, 1)

    ids = torch.arange(0, sz, out=torch.cuda.LongTensor(sz))
    mask[ids > lengths_query.unsqueeze(1) - 1] = 1

    return mask.unsqueeze(1).repeat(1, nheads, 1, 1).bool()

Alignment for one layer seems to be right

With mask value float(-inf) it became nan immediately.

hadaev8 · August 6, 2020, 1:21pm

How should i zero pad 2d data?

out = {}
x = [torch.randn(10, 10), torch.randn(5, 5)]
x = torch.nn.utils.rnn.pack_sequence(x, enforce_sorted=False)
out = torch.nn.utils.rnn.pad_packed_sequence(x, batch_first=True)

RuntimeError: The expanded size of the tensor (10) must match the existing size (5) at non-singleton dimension 1. Target sizes: [5, 10]. Tensor sizes: [5, 5]

ptrblck · August 7, 2020, 4:44am

In your example dim1 should be equal, so you could pad the second tensor with F.pad:

import torch.nn.functional as F
F.pad(torch.randn(5, 5), (2, 3, 0, 0))

Note that I’ve used a padding of 2 and 3 for the “left” and “right” side of dim1, but you could of course also only pad on one side with 5 values or chose any other valid configuration.

hadaev8 · August 24, 2020, 2:50pm

ptrblck:

Oh sorry, I apparently missed the most important part of the question.
I’m not sure, if there is a function for this, but this code snippet should work:
x = [torch.tensor([0, 1, 2, 3, 4]), torch.tensor([0, 1, 2])]
max_len = max([t.size(0) for t in x])
res = [torch.cat((t, t[:max_len-t.size(0)])) for t in x]

I finally tried your snippet, but it do not work if one sample x2 longer then another