Questions about Dataloader and Dataset

thanks @smth @apaszke, that really makes me have deeper comprehension of dataloader.

At first I try:

def my_loader(path):
    try: 
        return Image.open(path).convert('RGB')
    except Exception as e:
        print e
def my_collate(batch):
    "Puts each data field into a tensor with outer dimension batch size"
    batch = filter (lambda x:x is not None, batch)
    return default_collate(batch)

dataset= ImageFolder('/home/x/train/',
            transform=transforms.Compose([transforms.ToTensor()]),
            loader = my_loader)
dataloader=t.utils.data.DataLoader(dataset,4,True,collate_fn=my_collate)

it raise exception, because transforms in dataset can’t handle None

so then I try this:

def my_collate(batch):
    batch = filter (lambda x:x is not None, batch)
    return default_collate(batch)
class MyImageFolder(ImageFolder):
    __init__ = ImageFolder.__init__
    def __getitem__(self, index):
        try: 
            return super(MyImageFolder, self).__getitem__(index)
        except Exception as e:
            print e

dataset= MyImageFolder('/home/x/train/', transform = transforms.Compose([transforms.ToTensor()..]) )
dataloader=t.utils.data.DataLoader(dataset, 4, True, collate_fn=my_collate)

not so pythonic, but it works.
and I think the best way maybe just cleaning the data.

21 Likes