Next(iter(dataloader)) does NOT stopp; return same files again


When using

I have two issues:

  1. The dataloader doesn’t stop.
    TEST_SET has length of 101. I can iterate 1000 times and more …
    Shouldn’t there be a stopIteration?

– My CustomDataset contains a __len__ which (if I print it out within the __getitem__ seems to work).

  1. Output is redundant
    I saved the DataLoader-Output to a CSV-File

Quite a lot of Images are more than double.
0000003.png is not the only file.

This shouldn’t be the case, right?
What could be the reason therefore?

I am using the torchvision.datasets.kitti — Torchvision main documentation Dataset; just a bit adjusted to my own needings.


train_dataloader = DataLoader(dataset=train_data,
                              collate_fn= None,
                              batch_size=1, # how many samples per batch?
                              num_workers=1, # how many subprocesses to use for data loading? (higher = more)
                              shuffle=True) # shuffle the data?

This is how I have tested my DataLoader:

image_list = []
for count in range(0, 1001):
    a = next(iter(train_dataloader))
    if count % 100 == 0:
        print(f"{count} of 1000")

print(f"Last index is: {count}")
liste = pd.DataFrame(image_list)


You are repeatedly creating a new iterator of DataLoader in a = next(iter(train_dataloader)). Every time a new iterator of DataLoader is created, it can start reading from the beginning.

Instead, you should create it once (unless you actually want to read from the beginning multiple times).

This should work for your use case:

for a in train_dataloader:
1 Like

Thank you very much @nivek
Works! :smiley:

Btw. Is there a way to get the length and the actual index out?
(So that I can realize my

 if INDEX % 10 == 0:
        print(f"{INDEX} of {length}")

You can use enumerate as such:

for i, a in enumerate(train_dataloader):
    if i % 100 == 0:
        print(f"{i} of {len(train_dataloader)}")
1 Like