torch.utils.DataLoader causes a RuntimeError

What i want to do is to load the coco caption data set with torch.utils.DataLoader.
I resized all coco images to the fixed size and saved them into data/train2014resized directory.

import torch
import torchvision.datasets as dset
import torchvision.transforms as transforms

cap = dset.CocoCaptions(root = './data/train2014resized',
                        annFile = './data/annotations/captions_train2014.json',
                        transform=transforms.ToTensor())

print('Number of samples: ', len(cap))
img, target = cap[3]    # this works well

train_loader = torch.utils.data.DataLoader(
    cap, batch_size=1, shuffle=False, num_workers=1)
data_iter = iter(train_loader)

print (data_iter.next())    # this returns an error.

When I ran the code above, i got a huge RuntimeError message. The below is the bottom of the error message.

File "/Users/yunjey/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 75, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/Users/yunjey/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 71, in default_collate
    elif isinstance(batch[0], collections.Iterable):
  File "/Users/yunjey/anaconda2/lib/python2.7/abc.py", line 132, in __instancecheck__
    if subclass is not None and subclass in cls._abc_cache:
  File "/Users/yunjey/anaconda2/lib/python2.7/_weakrefset.py", line 75, in __contains__
    return wr in self.data
RuntimeError: maximum recursion depth exceeded in cmp

What can i do for solving this problem?

Do you have unicode objects in your batches?

Yes, the type of the coco caption is unicode (not string) so that this problem is caused. [torch.utils.data.CocoCaptions] (https://github.com/pytorch/vision/blob/master/torchvision/datasets/coco.py#L20) returns img and target. target is an unicode object.

I solved this problem by converting return type unicode to string.

There is two possible options to handle this problem.
i) Add the code in torch.utils.data.CocoCaptions to return target as a string type.
ii) Add the code in default_collater to handle the case of unicode object (The problem now is the function considers unicode as collections.Iterable).

Could i pull request to solve this problem?

1 Like

If you could send a pull request for (1), that would be great. Thank you.

I think we should support unicode in default_collate. But this will need an additional guard by sys.version_info[0] < 3, because there’s no such thing as unicode object in Python 3. I’ll open an issue and PRs are welcome :slight_smile:

Thanks, i added the code and tested the code works well in Python 2.7.
I have opened pull request below.