Next(iter(dataloader)) does NOT stopp; return same files again

MikeTensor · February 28, 2023, 5:50pm

Hello!

When using
next(iter(train_dataloader))

I have two issues:

The dataloader doesn’t stop.
TEST_SET has length of 101. I can iterate 1000 times and more …
Shouldn’t there be a stopIteration?

– My CustomDataset contains a __len__ which (if I print it out within the __getitem__ seems to work).

Output is redundant
I saved the DataLoader-Output to a CSV-File

Bildschirmfoto vom 2023-03-01 02-43-13365×839 28.4 KB

Quite a lot of Images are more than double.
0000003.png is not the only file.

This shouldn’t be the case, right?
What could be the reason therefore?

I am using the torchvision.datasets.kitti — Torchvision main documentation Dataset; just a bit adjusted to my own needings.

################

train_dataloader = DataLoader(dataset=train_data,
                              collate_fn= None,
                              batch_size=1, # how many samples per batch?
                              num_workers=1, # how many subprocesses to use for data loading? (higher = more)
                              pin_memory=True,
                              shuffle=True) # shuffle the data?

This is how I have tested my DataLoader:

image_list = []
for count in range(0, 1001):
    a = next(iter(train_dataloader))
    image_list.append(a)
    if count % 100 == 0:
        print(f"{count} of 1000")

print(f"Last index is: {count}")
liste = pd.DataFrame(image_list)
liste.to_csv('myDataTest/image_list.csv')

nivek · February 28, 2023, 6:04pm

Hi,

You are repeatedly creating a new iterator of DataLoader in a = next(iter(train_dataloader)). Every time a new iterator of DataLoader is created, it can start reading from the beginning.

Instead, you should create it once (unless you actually want to read from the beginning multiple times).

This should work for your use case:

for a in train_dataloader:
     image_list.append(a)

MikeTensor · February 28, 2023, 6:14pm

Thank you very much @nivek
Works!

Btw. Is there a way to get the length and the actual index out?
(So that I can realize my

 if INDEX % 10 == 0:
        print(f"{INDEX} of {length}")

nivek · February 28, 2023, 6:15pm

You can use enumerate as such:

for i, a in enumerate(train_dataloader):
    image_list.append(a)
    if i % 100 == 0:
        print(f"{i} of {len(train_dataloader)}")