Weird behaviour by dataloader

Prateek_Singh · June 21, 2019, 9:45am

Dataloader has this weird issue where it loads only up to the batch just below index 400. The check to find errors in the data turned out to be negative and I also tried taking a subset (350 - end, including the index where it gave the error) and it worked just fine this time. I do not understand the issue and would appreciate any help.

But before that, let me explain the scenario a bit better.

The data has around 700 images is saved in the form of an array of dictionaries with the metadata in the following format.

>> ct_dataset[50]

 {'Folder': '373::CT Thin Plain',
 'Label': 'Normal',
 'Error': 0,
 'Resample': array([[[-1024, -1024, -1024, ..., -1024, -1024, -1024],
         [-1024, -1024, -1024, ..., -1024, -1024, -1024],
         [-1024, -1024, -1024, ..., -1024, -1024, -1024],
         ...,
         [-1024, -1024, -1024, ..., -1024, -1024, -1024],
         [-1024, -1024, -1024, ..., -1024, -1024, -1024],
         [-1024, -1024, -1024, ..., -1024, -1024, -1024]]
 
        ...,
 
        [[-1024, -1024, -1024, ..., -1024, -1024, -1024],
         [-1024, -1024, -1024, ..., -1024, -1024, -1024],
         [-1024, -1024, -1024, ..., -1024, -1024, -1024],
         ...,
         [-1024, -1024, -1024, ..., -1024, -1024, -1024],
         [-1024, -1024, -1024, ..., -1024, -1024, -1024],
         [-1024, -1024, -1024, ..., -1024, -1024, -1024]]], dtype=int16)}

The reshape entry has the dimensions of (100, 250, 250) which is uniform across the entire dataset.

But when I run the following command to check the dataset:

for i, d in enumerate(dataloaders['train']):
    print(i, d)

I get the following error,

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 100 and 0 in dimension 1 at /opt/conda/conda-bld/pytorch_1549635019666/work/aten/src/TH/generic/THTensorMoreMath.cpp:1307

But the dimension 0 of all images is 100. When I set shuffle=False on the dataloader, I find that it stops just before index 400 of the dataset. But when I subset the dataset from 350-end to replicate the data error, it loads all data properly. This confuses me a lot since if this was a data error, it should have thrown an error on index 50 of the subset.

Hope the issue makes sense, any advice would be well appreciated.

Prateek_Singh · June 21, 2019, 9:46am

This should be dimension 1. Sorry for the typo

ptrblck · June 21, 2019, 11:19am

Could you add a print statement in __getitem__ of your Dataset and check the current index?
This could make it easier to find the faulty sample, which is causing this issue.

Prateek_Singh · June 21, 2019, 1:51pm

Got the issue. There was one sample which had (0,250,250) dimensions. Don’t know why it wasn’t captured when I tried printing unique dims for all data.

Anyways, thank you for the guidance. I really appreciate it.

PS: I had to turn num_workers=0 in DataLoader to get better understanding since it made the processing serial.

Thanks again.