Can't iterate splited Dataloader

Hello, I’m complitely new in Pytorch and here is my trouble.

  1. I create a Dataset with ImageFolder
  2. Split indices with sklearn train_test_split
  3. Make 2 Subsets.
  4. Make new MyDataset class like here because I want make different transforms.
  5. Create 2 DataLoaders
  6. When I’m trying to take 1 sample from DataLoader(with next(iter(DataLoader)) nothing happens. Code runs endlessly. No errors, no warnings.

Here is the code:

# make Dataset
train_data = datasets.ImageFolder(root=train_path)

# all classes list
all_classes = list(
    (x.parent.name for x in sorted(
        Path('train/simpsons_dataset/').rglob('*.jpg'))))

# split indices
train_indices, val_indices = train_test_split(
    range(len(train_data)),
    train_size = 0.8,
    stratify=all_classes,
    random_state=42)

# make Subsets
train_subset = Subset(train_data, train_indices)
val_subset = Subset(train_data, val_indices)

# Dataset class for Subsets
class MyDataset(Dataset):
    def __init__(self, subset, transform=None):

        self.subset = subset
        self.transform = transform
        
    def __getitem__(self, index):
        img, label = self.subset[index]
        if self.transform:
            img = self.transform(img)
        return img, label
        
    def __len__(self):
        return len(self.subset)

# make new Datasets
train_dataset = MyDataset(train_subset, data_transforms['train'])
val_dataset = MyDataset(val_subset, data_transforms['val'])

# make Dataloaders
train_dataloader = DataLoader(train_dataset, batch_size=64, num_workers=2, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=64, num_workers=2, shuffle=False)
train_dataloader.dataset.transform

#finally I'm trying to iterate my Dataloader
img, label = next(iter(train_dataloader))
print(f"Image shape: {img.shape} ")
print(f"Label shape: {label.shape}")

If I iterate train_data = datasets.ImageFolder(root=train_path) - everything is ok!
Please help!

Based on your description it seems you might be running into issues with the DataLoader so I would recommend trying to iterate the DataLoader object directly to see if this would work (I would assume it would also hang) and then trying to isolate the issue further by e.g. setting the num_workers to 0.

1 Like

@ptrblck_de,
Thank you for helping. I’m not sure, that I understand You correct. What do yo mean here?

trying to iterate the DataLoader object directly

You’ve mentioned that calling:
next(iter(loader) seems to hang so I would also be interested to double check if:

for data in loader:
    ...

would also hang as I would assume that’s the case.

The same with:

for data in loader:
    ...

I have set num_workers=0 and everything is working! Thank you so much!
Can you please explain why so? Because before this I always set num_workers=2 and haven’t problens with it,