DataLoader for various length of data

GalAvineri · May 23, 2019, 9:36am

To answer the original question, you can pass a (simple and short) custom collate function to the data loader that uses pack_sequence.

pack_sequence does not require the sequences to be padded or sorted by length, so it is simpler to use.

Here is the code that does this (based on this answer to a similar question: How to create a dataloader with variable-size input )

from torch.nn.utils.rnn import pack_sequence
from torch.utils.data import DataLoader

def my_collate(batch):
    # batch contains a list of tuples of structure (sequence, target)
    data = [item[0] for item in batch]
    data = pack_sequence(data, enforce_sorted=False)
    targets = [item[1] for item in batch]
    return [data, targets]

# ...
# later in you code, when you define you DataLoader - use the custom collate function
loader = DataLoader(dataset,
                      batch_size,
                      shuffle,
                      collate_fn=my_collate, # use custom collate function here
                      pin_memory=True)