Inconsistent tensor size while loading data using data loader

karmus89 · February 20, 2018, 12:19pm

I’ve run into similar error, but I’ve also noticed that I’m not getting it when I disable shuffling. With shuffle=True my code fails with similar error message, but setting the parameter to False returns a correct-looking batch of tensors.

Edit:

Here’s the error output.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-14-985e62e21c28> in <module>()
      1 dataloader = DataLoader(DroneRGBEarlier(), batch_size=4, shuffle=True)
      2 
----> 3 for i, batch in enumerate(dataloader):
      4     print(i, batch['x'].size(), batch['y'].size())
      5     break

C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
    186         if self.num_workers == 0:  # same-process loading
    187             indices = next(self.sample_iter)  # may raise StopIteration
--> 188             batch = self.collate_fn([self.dataset[i] for i in indices])
    189             if self.pin_memory:
    190                 batch = pin_memory_batch(batch)

C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py in default_collate(batch)
    114         return batch
    115     elif isinstance(batch[0], collections.Mapping):
--> 116         return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
    117     elif isinstance(batch[0], collections.Sequence):
    118         transposed = zip(*batch)

C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py in <dictcomp>(.0)
    114         return batch
    115     elif isinstance(batch[0], collections.Mapping):
--> 116         return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
    117     elif isinstance(batch[0], collections.Sequence):
    118         transposed = zip(*batch)

C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py in default_collate(batch)
     94             storage = batch[0].storage()._new_shared(numel)
     95             out = batch[0].new(storage)
---> 96         return torch.stack(batch, 0, out=out)
     97     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     98             and elem_type.__name__ != 'string_':

C:\Anaconda3\envs\ml\lib\site-packages\torch\functional.py in stack(sequence, dim, out)
     62     inputs = [t.unsqueeze(dim) for t in sequence]
     63     if out is None:
---> 64         return torch.cat(inputs, dim)
     65     else:
     66         return torch.cat(inputs, dim, out=out)

RuntimeError: inconsistent tensor sizes at d:\pytorch\pytorch\torch\lib\th\generic/THTensorMath.c:2864

Edit 2:

After traversing the source @smth posted, I felt it was maybe crucial to shed light on the dataset also. My dataset is effectively a collection of inputs and targets. A single sample is a dict of the following structure:

sample = {
    'x': tensor.DoubleTensor of size [3 x 32 x 32],
    'y': tensor.FloatTensor of size [32 x 32]
}

Edit 3:

Oddly enough I noticed that when shuffle=True, having the batch_size be 2 at maximum allows for the code to run without problems. With any larger batch_size the code will fail.