Custom DataSet class passed to loader not loading batches

As suggested by the title, I have a custom dataset which inherits from torch.utils.data.Dataset. The getitem method returns a tuple of tensors (piano_roll, tags, target).

# Dataset
train_dataset = dataset.pianoroll_dataset_batch('./datasets/training/piano_roll_fs1')  # pianoroll_dataset_batch instance
# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=False,
                                       drop_last=True)  # Drop-last

However, when I set a batch_size other than 1, I get an error which for me doesn’t make any sense:

<class ‘tuple’>: (<class ‘RuntimeError’>, RuntimeError(‘invalid argument 0: Sizes of tensors must match except in dimension 0. Got 283 and 226 in dimension 1 at c:\programdata\miniconda3\conda-bld\pytorch_1533096106539\work\aten\src\thc\generic/THCTensorMath.cu:87’,), None)

I don’t know what 283 represents, but 226 is the first dimension in the piano_roll that should be returned from train_loader (complete shape is [226, 1, 128]).

I guess that means that some of the training examples have more dimensions than others. The DataLoader assumes that each example has the same number of features, because that’s how it’s concatenated into a batch.

Hmm, I have 43 cases. Each case consists of a matrix that is (128, sequence_length) where sequence length is number of seconds, so this is going to vary. The varying sequence length can’t be what’s causing the problem?

so this is going to vary. The varying sequence length can’t be what’s causing the problem?

This is going to be a problem though. I.e., how do you concatenate vectors or matrices of different dimensions? It’s an impossible task unless you do some padding or trimming.

E.g,. here are some ideas that may be helpful: