Confusion with __len__ for custom dataloading

I’m currently trying to load datasets that have separate sub-folders( as video sequences), and the number of images under each sub-folder varies. I assume that _len_ is for the length of the entire datasets. But since the images under training are separated in sequences under different folders, would this effect how the _len_ works in PyTorch?

It depends how you are loading each sample.
The __len__ method returns the length of the complete Dataset and thus also defines the used indices (passed into __getitem__).
E.g. if you are loading single images via the passed index in __getitem__, the length should define the total number of images in your dataset.

1 Like