Using variable sized input - Is padding required?

pinocchio · July 25, 2019, 6:11pm

My take on how to solve this issue:

def collate_fn_padd(batch):
    '''
    Padds batch of variable length

    note: it converts things ToTensor manually here since the ToTensor transform
    assume it takes in images rather than arbitrary tensors.
    '''
    ## get sequence lengths
    lengths = torch.tensor([ t.shape[0] for t in batch ]).to(device)
    ## padd
    batch = [ torch.Tensor(t).to(device) for t in batch ]
    batch = torch.nn.utils.rnn.pad_sequence(batch)
    ## compute mask
    mask = (batch != 0).to(device)
    return batch, lengths, mask

There seems to be a large collection of posts all over pytorch that makes it difficult to solve this issue. I have collected a list of all of them hopefully making things easier for all of us. Here:

bucketing:

Tensorflow-esque bucket by sequence length

Also, Stack-overflow has a version of this question too:

crossposted: https://www.quora.com/unanswered/How-does-Pytorch-Dataloader-handle-variable-size-data