How to do padding based on lengths?

pinocchio · July 25, 2019, 6:16pm

I used torch.nn.utils.rnn.pad_sequence for my dataloader class:

def collate_fn_padd(batch):
    '''
    Padds batch of variable length

    note: it converts things ToTensor manually here since the ToTensor transform
    assume it takes in images rather than arbitrary tensors.
    '''
    ## get sequence lengths
    lengths = torch.tensor([ t.shape[0] for t in batch ]).to(device)
    ## padd
    batch = [ torch.Tensor(t).to(device) for t in batch ]
    batch = torch.nn.utils.rnn.pad_sequence(batch)
    ## compute mask
    mask = (batch != 0).to(device)
    return batch, lengths, mask

Many Many related posts:

bucketing:

Tensorflow-esque bucket by sequence length

Even in Stack overflows there is a question about this:

crossposted: https://www.quora.com/unanswered/How-does-Pytorch-Dataloader-handle-variable-size-data