Dataloader hangs when calling next() using custom data

Lampedoesa · May 18, 2020, 1:35pm

Im trying to use custom dataset with the CocoDetection format, the cocoapi gives a succes on indexing and code passes but hangs when calling next()

train_dataset = datasets.CocoDetection(args.image_path, args.data_path, transform=coco_transformer())

querry_dataloader = data.DataLoader(train_dataset, sampler=sampler, batch_size=args.batch_size, drop_last=True, num_workers=0)

labeled_data = self.read_data(querry_dataloader)

labeled_imgs, labels = next(labeled_data) #Hangs here

with read_data() being as follows

def read_data(self, dataloader, labels=True):
        if labels:
            while True:
                for img, label, _ in dataloader:
                    yield img, label
        else:
            while True:
                for img, _, _ in dataloader:
                    yield img

The source code I’m trying to implement can be found here https://github.com/sinhasam/vaal
I’m also doing this in the Nvidia NGC pytorch container version 19.10 but the issue persists in the latest version.

Thanks in advance!

ptrblck · May 19, 2020, 9:08am

Are you able to get a single data sample using the Dataset without a DataLoader?

data, label, _ = train_dataset[0]

Lampedoesa · May 19, 2020, 12:24pm

Yes, however, I do get an error

Traceback (most recent call last):
  File "../vaal/main.py", line 143, in <module>
    main(args)
  File "../vaal/main.py", line 73, in main
    ndata, nlabel, _ = train_dataset[0]
ValueError: not enough values to unpack (expected 3, got 2)

When I use a custom dataset, however, as they do in the source code I get past this (getting the image and the label), but it freezes again at the same point.

Custom dataset:

class KFuji(Dataset):
    def __init__(self, image_path, json_path):
        self.kfuji = datasets.CocoDetection(root=image_path, annFile=json_path, transform=coco_transformer())

    def __getitem__(self, index):
        if isinstance(index, numpy.float64):
            index = index.astype(numpy.int64)
        data, target = self.kfuji[index]

        return data, target, index

    def __len__(self):
        return len(self.kfuji)

ptrblck · May 20, 2020, 7:13am

Could you try to iterate the dataset once for all samples just to make sure there is not a hidden bug somewhere, which lets the DataLoader hang?

Lampedoesa · May 20, 2020, 5:49pm

It iterates through the data just fine, using:

for i in range(len(train_dataset)):
            data, label, _ = train_dataset[i]

Lampedoesa · June 4, 2020, 9:02am

Turns out my batch size was too large for the initial subset of the data used, which caused the dataloader to hang.

thistlillo · July 20, 2022, 10:17am

I have the same problem (July 2022). Dataloader with multiple workers hangs after 1st batch. Every one in a while it runs fine. I iterated over the entire dataset without problems.

morenolq · March 17, 2023, 9:07am

I got the same problem when setting num_workers>0. I also tried putting a timeout and skipping the batch eventually, but after the first time the timeout is reached, all next batch=next(dataloader_iter) instructions fail with the same timeout error.

I surely tried to go through the whole dataset once without problems.