Inconsistent Data Loader Error

ColinConwell · December 17, 2018, 6:40am

After writing a custom data loader for the CelebADataset, I’m encountering a strange error wherein the training subset of the dataset is loading correctly, but the test subset is throwing the following, (to me) incomprehensible error:

Here’s my code for the data loader:

def Crop(i, j, h, w):
    def crop(img):
        return transforms.functional.crop(img, i, j, h, w)
    return crop

class CelebADataset(Dataset):
    def __init__(self, root='../CelebADataset', train=True, transform=None, target_transform=None):
        self.root = os.path.expanduser(root)
        self.transform = transform
        self.target_transform = target_transform
        self.train = train
        self.split_file = os.path.join(root, 'NameOfFile.txt')
        
        self.df = pd.read_csv(root + '/list_eval_partition.txt', sep=' ', header=None, names=['filename', 'label'])
        if self.train:
            self.images = self.df[self.df.label==0].filename
        else:
            self.images = self.df[self.df.label==1].filename
            
    def __getitem__(self, index):
        filename = os.path.join(self.root, self.images[index])
        img = Image.open(filename)
        if self.transform:
            img = self.transform(img)
            
        return img
    def __len__(self):
        return len(self.images)

transform = transforms.Compose([Crop(38, 12, 146, 146), transforms.Resize((128,128)), transforms.ToTensor()])
train_dataset = CelebADataset(train=True, transform=transform)
test_dataset = CelebADataset(train=False, transform=transform)
#train_dataset length: 162770
#test_datase length: 19867

train_loader = DataLoader(dataset=train_dataset, batch_size=16, num_workers=8, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=16, num_workers=8, shuffle=False)

The error occurs when I call the dataloader as so:

imgs = next(iter(loader))

While the train loader works, the test loader does not.

I’ve examined the images in the dataset and can find no obvious reason for this discrepancy. (It’s worth noting that the images are definitely there, and seem to be accessible.)

I’d be much obliged for any advice!

ptrblck · December 17, 2018, 7:34am

It looks like pandas is throwing a KeyError for the index 19550.
Could you check, if this index is accessible as test_dataset.images[19550]?

ColinConwell · December 17, 2018, 8:37am

Thanks for the rapid reply!

test_dataset.images[19550] also throws a key error.

Is there a way I can assess which image it’s pointing to in order to assess whether that image is in the directory?

ColinConwell · December 17, 2018, 8:42am

To note, I’ve now tried test_dataset.images[] on a number of indices. All of them are throwing key errors.

ptrblck · December 17, 2018, 9:57am

Thanks for the info. In that case your pd.DataFrame seems to have some invalid indices.
Try to print its content and check, which indices/keys are expected.