Why does Pytorch data loader load an image num_workers times?

I was plotting the images loaded by my data loader using an iterator and I observed that the images get repeated num_workers times in the iterator.

I am also using batch_size 1, is there a relationship between the batch size and max number num_workers can have? Should it be less than or equal to the batch_size

initialising the dataloader

train_dataloader = DataLoader(train_datagen1, shuffle=True, num_workers=2, batch_size=1)

iterating through the images

dataiter = iter(train_dataloader)
images, labels = dataiter.next()

Just confirm:

  1. what type of dataset is the train_datagen1? Is is an IterableDataset? If is, the duplicate data is expected. For more info, have a look at torch.utils.data β€” PyTorch 1.10 documentation
  2. Maybe it’s not the problem of dataloader. Ensure the images in train_datagen1 are not duplicated.
1 Like