How does pytorch decide the size of dataset?

It seems that the iteration behavior of dataset depends on the idx variable in getitem function. Is it expected or a bug?

import torch
from torch.utils.data import Dataset
class MyDataset(Dataset):
    def __init__(self):
        self.A = torch.randn(3,4)

    def __len__(self):
        return 3

    def __getitem__(self, idx):
        d = self.A[idx]  # comment out this line leads to infinite loop
        return 0


face_dataset = MyDataset()

for i, sample in enumerate(face_dataset):
    print(i)

As expected, the output is

0
1
2

However, when commenting out the line " d = self.A[idx] ", the loop never stops unless control + c is pressed.

...
1457
1458
....

This is not expected because the size of the dataset is already defined in the len() function. Is this a bug? How to avoid this?

Iterating a raw Dataset depends on a StopIteration signal coming from e.g. an out of bounds index. Wrap the Dataset into DataLoader or make sure a StopIteration will be raised.

2 Likes

Thanks! Wrapping dataset into Dataloader solves this problem.