I’ve run into similar error, but I’ve also noticed that I’m not getting it when I disable shuffling. With shuffle=True my code fails with similar error message, but setting the parameter to False returns a correct-looking batch of tensors.
Edit:
Here’s the error output.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-14-985e62e21c28> in <module>()
1 dataloader = DataLoader(DroneRGBEarlier(), batch_size=4, shuffle=True)
2
----> 3 for i, batch in enumerate(dataloader):
4 print(i, batch['x'].size(), batch['y'].size())
5 break
C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
186 if self.num_workers == 0: # same-process loading
187 indices = next(self.sample_iter) # may raise StopIteration
--> 188 batch = self.collate_fn([self.dataset[i] for i in indices])
189 if self.pin_memory:
190 batch = pin_memory_batch(batch)
C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py in default_collate(batch)
114 return batch
115 elif isinstance(batch[0], collections.Mapping):
--> 116 return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
117 elif isinstance(batch[0], collections.Sequence):
118 transposed = zip(*batch)
C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py in <dictcomp>(.0)
114 return batch
115 elif isinstance(batch[0], collections.Mapping):
--> 116 return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
117 elif isinstance(batch[0], collections.Sequence):
118 transposed = zip(*batch)
C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py in default_collate(batch)
94 storage = batch[0].storage()._new_shared(numel)
95 out = batch[0].new(storage)
---> 96 return torch.stack(batch, 0, out=out)
97 elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
98 and elem_type.__name__ != 'string_':
C:\Anaconda3\envs\ml\lib\site-packages\torch\functional.py in stack(sequence, dim, out)
62 inputs = [t.unsqueeze(dim) for t in sequence]
63 if out is None:
---> 64 return torch.cat(inputs, dim)
65 else:
66 return torch.cat(inputs, dim, out=out)
RuntimeError: inconsistent tensor sizes at d:\pytorch\pytorch\torch\lib\th\generic/THTensorMath.c:2864
Edit 2:
After traversing the source @smth posted, I felt it was maybe crucial to shed light on the dataset also. My dataset is effectively a collection of inputs and targets. A single sample is a dict of the following structure:
sample = {
'x': tensor.DoubleTensor of size [3 x 32 x 32],
'y': tensor.FloatTensor of size [32 x 32]
}
Edit 3:
Oddly enough I noticed that when shuffle=True, having the batch_size be 2 at maximum allows for the code to run without problems. With any larger batch_size the code will fail.
After drilling deep down to replicating the DataLoader’s tensor collation I’ve actually found out that the cause for my error was indeed in the batch’s target tensors. Some of the y tensors have the shape [32, 32] while others have [32, 33]. This wasn’t immediately clear, as the shuffle brings out these tensors up by chance. Thus I rest my case and go back fixing my code.