Weird torch.cat error during training

Hi, I’m new to PyTorch but I’ve largely been able to get the hang of things. However, I ran into a weird error while running my code. The network trains for 133 iterations of epoch 1 and then crashes for no reason. The error message I get is below. There seems to be some error during the image load step. I have included the relevant functions in the following gist. I had a look at the images and they seem fine. Any help will be hugely appreciated !

bash-4.2$ python learnFeaturePoseModel.py 2
Learning feature + pose models
Train: 1431898 Val: 476905 Test: 472303
0% (133 of 44747) | | Elapsed Time: 0:01:36 ETA: 8:58:48Traceback (most recent call last):
File “learnFeaturePoseModel.py”, line 152, in
main()
File “learnFeaturePoseModel.py”, line 133, in main
train(train_loader, model, criterion, optimizer)
File “learnFeaturePoseModel.py”, line 51, in train
for i, sample in enumerate(train_loader):
File “/cis/home/msid/anaconda3/lib/python3.5/site-packages/torch/utils/data/dataloader.py”, line 188, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File “/cis/home/msid/anaconda3/lib/python3.5/site-packages/torch/utils/data/dataloader.py”, line 116, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
File “/cis/home/msid/anaconda3/lib/python3.5/site-packages/torch/utils/data/dataloader.py”, line 116, in
return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
File “/cis/home/msid/anaconda3/lib/python3.5/site-packages/torch/utils/data/dataloader.py”, line 96, in default_collate
return torch.stack(batch, 0, out=out)
File “/cis/home/msid/anaconda3/lib/python3.5/site-packages/torch/functional.py”, line 64, in stack
return torch.cat(inputs, dim)
TypeError: cat received an invalid combination of arguments - got (list, int), but expected one of:

  • (sequence[torch.FloatTensor] seq)
  • (sequence[torch.FloatTensor] seq, int dim)
    didn’t match because some of the arguments have invalid types: (list, int)

I was able to figure out the error via python -m pdb learnFeaturePoseModel.py 0

A torch.DoubleTensor was part of the elements to be concatenated. All other elements were torch.FloatTensor. The error message is still misleading though and I think this has been referenced in https://github.com/pytorch/pytorch/issues/2485