Hi, I am trying to create a Dataloader that takes in variable input lengths however I get an error when doing this for a batch_size greater than 1.
I have heard that there are ways around this using torch.nn.utils.rnn.pack_sequence and creating a
collate_func however it is unclear to me how this works so I have created a very simple example of a Pytorch
Dataset that creates random sequences of variable lengths below.
I am unsure how to fiddle with the collate_func together with the torch.nn.utils.rnn.pack_sequence to create Dataloader that takes accepts variable input lengths.
For clarity, I intend to use this with an LSTM and so the
rnn.pack_sequence function looks relevant as well.
import numpy as np from numpy.random import rand from random import randint import torch from torch.utils.data import DataLoader, Dataset
class SequenceFactory(Dataset): """ A Dataset that spits out arrays with a random size between 1 and 8. """ def __init__(self): max_len = 8 no_of_sequences = 100 #create list of arrays of variable lengths between 1 and 8 self.data = [rand(randint(1,max_len)) for seq in range(no_of_sequences)] def __getitem__(self, index): return self.data[index] def __len__(self): return len(self.data) #100 data = SequenceFactory() dataloader = DataLoader(data, batch_size=2, shuffle=True) next(iter(dataloader)) RuntimeError: stack expects each tensor to be equal size, but got  at entry 0 and  at entry 1