KeyError when enumerating over dataloader

When enumerating over dataloaders I get the following error:

Traceback (most recent call last):
File “train.py”, line 218, in
main()
File “train.py”, line 109, in main
train_valid(model, optimizer, scheduler, epoch, data_loaders, data_size, t)
File “train.py”, line 128, in train_valid
for batch_idx, batch_sample in enumerate(dataloaders[phase]):
File “/home/mhouben/miniconda3/envs/pytorch12/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 819, in next
return self._process_data(data)
File “/home/mhouben/miniconda3/envs/pytorch12/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 846, in _process_data
data.reraise()
File “/home/mhouben/miniconda3/envs/pytorch12/lib/python3.6/site-packages/torch/_utils.py”, line 369, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File “/home/mhouben/miniconda3/envs/pytorch12/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py”, line 178, in _worker_loop
data = fetcher.fetch(index)
File “/home/mhouben/miniconda3/envs/pytorch12/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py”, line 47, in fetch
return self.collate_fn(data)
File “/home/mhouben/miniconda3/envs/pytorch12/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py”, line 75, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in elem}
File “/home/mhouben/miniconda3/envs/pytorch12/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py”, line 75, in
return {key: default_collate([d[key] for d in batch]) for key in elem}
File “/home/mhouben/miniconda3/envs/pytorch12/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py”, line 75, in
return {key: default_collate([d[key] for d in batch]) for key in elem}
KeyError: ‘anc_img’

I am running out of ideas on how to figure out what causes this error. Any thoughts on what could cause this? Or how I should approach this error?

Could you post the definition of your Dataset?
Also, you could try to iterate the dataset directly without using a DataLoader for debugging purposes.

1 Like

My dataset consists of about 500k .png images, each with dimension 250x250x3 (CASIA Webface, which I aligned using https://github.com/davidsandberg/facenet/tree/master/src/align). Each epoch it takes 20k triplets of images from this dataset, 10k for training and 10k for validation.

I will try to iterate over the dataset directly and post a response once I am done.

I still have not found a solution for my error. I tried to leave out the DataLoader but it seems like an awful lot of work to remake how the DataLoader fills these batches, or is there another way than making my own dataloaderesque class and implementing my own iter() implementation to fill the batches?

Perhaps it helps if I post my github repo: https://github.com/washizzle/facenet_pytorch

I call:
python train.py --train_format=.png --valid_format=.png --start-epoch=159 --pure_validation --load_pth_from=./log/20190814-221208/ --train-root-dir=/lustre2/0/wsdarts/datasets/vggface2_train/ --valid-root-dir=/lustre2/0/wsdarts/datasets/CASIA_aligned/ --train-csv-name=~/facenet_pytorch/datasets/train_vggface2.csv --valid-csv-name=~/facenet_pytorch/datasets/CASIA.csvpython train.py --train_format=.png --valid_format=.png --start-epoch=159 --pure_validation --load_pth_from=./log/20190814-221208/ --train-root-dir=/lustre2/0/wsdarts/datasets/vggface2_train/ --valid-root-dir=/lustre2/0/wsdarts/datasets/CASIA_aligned/ --train-csv-name=~/facenet_pytorch/datasets/train_vggface2.csv --valid-csv-name=~/facenet_pytorch/datasets/CASIA.csv

and it goes wrong on line 141 in train.py

I would just create a look using your Dataset without wrapping it into a DataLoader:

dataset = ...
for idx, (data, image) in enumerate(dataset):
    print(idx)

This loop should raise an exception once the KeyError is raised. Using the last printed idx you could narrow down, which samples creates this issue.

2 Likes

Doing this results in the following error instead, for every image:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 64 3 7, but got 3-dimensional input of size [3, 182, 182]
I guess because it is missing the batch dimension.

You should use this loop to isolate the issue regarding loading a specific sample and the raised KeyError.
Once you’ve found the issue, you should use the DataLoader again.

2 Likes

Thank you for the tip

this helped me figured out my IndexError. I added sampler to my validation set on accident and doing

next(iter(dataloaders['val']))

did the trick!

Hi, I am getting same error. Here is the traceback. Also I am able to iterate the full dataset without any exceptions as suggested by @ptrblck above.

Traceback (most recent call last):
  File "iBatchLearn.py", line 150, in <module>
    acc_table, task_names = run(args)
  File "iBatchLearn.py", line 78, in run
    agent.learn_batch(train_loader, val_loader)
  File "/home/shivam/ssaboo2020/Continual-Learning-Benchmark/agents/exp_replay.py", line 39, in learn_batch
    super(Naive_Rehearsal, self).learn_batch(new_train_loader, val_loader)
  File "/home/shivam/ssaboo2020/Continual-Learning-Benchmark/agents/default.py", line 222, in learn_batch
    self.validation(val_loader)
  File "/home/shivam/ssaboo2020/Continual-Learning-Benchmark/agents/default.py", line 115, in validation
    for i, (input, target, task) in enumerate(dataloader):
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
    return self._process_data(data)
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/shivam/ssaboo2020/Continual-Learning-Benchmark/dataloaders/wrapper.py", line 51, in __getitem__
    img,target = self.dataset[index]
  File "/home/shivam/ssaboo2020/Continual-Learning-Benchmark/dataloaders/wrapper.py", line 81, in __getitem__
    img,target = self.dataset[self.indices[index]]
  File "/home/shivam/ssaboo2020/Continual-Learning-Benchmark/dataloaders/wrapper.py", line 33, in __getitem__
    img,target = self.dataset[index]
  File "/home/shivam/ssaboo2020/Continual-Learning-Benchmark/dataloaders/custom_datasets.py", line 22, in __getitem__
    img = Image.open(self.df['PATH'].iloc[index])
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/pandas/core/indexing.py", line 1500, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/pandas/core/indexing.py", line 2230, in _getitem_axis
    self._validate_integer(key, axis)
  File "/home/shivam/ssaboo2020/incremental/lib/python3.5/site-packages/pandas/core/indexing.py", line 2139, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

That’s strange, as an indexing error should be also raised without the DataLoader.
Anyway, could you print the passed index to your __getitem__ and use num_workers=0, to see which index is failing?

I did that and it fails on 6021-th index. Here is my simple custom dataset. The length of the dataframe is 6134. I tried removing the csv entry at 6021th index and trying again but the dataset fails at the same index again.

class FlowerBirds(Dataset):

    def __init__(self, reverse=True, mode='train', transform=None):
        ROOT = 'data_stuf/'
        if reverse:
            self.df = pd.read_csv(ROOT+mode+'_'+'bf.csv')
        else:
            self.df = pd.read_csv(ROOT+mode+'_'+'fb.csv')
        self.tfm = transform
        #print(len(self.df), 'DF LEN')
        #self.normalize = transforms.Normalize(mean=[0.491, 0.482, 0.447], std=[0.247, 0.243, 0.262]) 

    def __getitem__(self, index):
        #print(index, "INDEX")
        img = Image.open(self.df['PATH'].iloc[index])
        label = self.df['LABEL'].iloc[index]
        img = self.tfm(img)
        if img.shape[0] == 1:
            img = img.repeat(3, 1, 1)
        #img = self.normalize(img)
        label = torch.LongTensor(np.array([label])).item()
        return img, label

    def __len__(self):
        return len(self.df)

Do you get a proper error message, when you call your Dataset directly with this index via:

dataset[6021]

I get proper output tensors rather. However I am testing this dataset outside seperately and in my actual code it is wrapped by another dataset. But still, the final traceback is upto this dataset (and the index at which it shows error is valid when I test this dataset separately) hence I am unable to understand the problem.

The issue was with the way I was splitting my data into tasks. Fixed it. The error was infact not related to dataset or datalaoder.

3 Likes

Thank you for your comment! It really help me.
I got the same error and it was because I was splitting my data and forgetting to reset the indexes after splitting.

2 Likes

Hello, I meet the same problem, could you tell me how to reset the indexes?? Thanks a lot!!

Hello, Karen said the indexes should be reset, but I do not know how to reset it, could you tell me how to reset the indexe??

Thanks a lot!!

I’m not sure what @kalilamali means exactly by resetting the indexes after splitting.
Maybe the original indices (from the complete dataset) were used to index the Subsets, which would yield an IndexError.

Hi. The indexes I forgot to reset were in my df.
I was splitting my data like this:

if folds == 0:
        # Train and validation
        train, val = train_test_split(train_val, test_size=0.2, random_state=seed, shuffle=True)
        train, val = train.reset_index(drop=True), val.reset_index(drop=True)

And then I created this custom dataloader to retrieve the index.

# Custom pytorch dataloader for this dataset
    class Derm(Dataset):
        """
        Read a pandas dataframe with
        images paths and labels
        """
        def __init__(self, df, transform=None):
            self.df = df
            self.transform = transform

        def __len__(self):
            return len(self.df)

        def __getitem__(self, index):
            # Load image data and get label
            try:
                X = Image.open(self.df['filenames'][index]).convert('RGB')
                #y = torch.tensor(self.df.iloc[index,2:])
                y = torch.tensor(self.df['label_code'][index])
            except IOError as err:
                pass

            if self.transform:
                X = self.transform(X)
            # Sanity check
            print('id:', self.df['id'][index], 'label', y)
            return index, X, y

And then on the train/eval loop:

for index, inputs, labels in dataloader:
...