I would expect list(dataset) to return 500, but somehow this is ignored.
The same problem happens when you iterate over the dataset with for loop.
I mean how does it even know that there are 1000 elements?
I guess the list function results in calling the getitem function for each element.
Since, the dataset class is pretty much an abstract class in which one has to implement the __getitem__function. The len function allows one to have a custom size for the dataset separate from the actual iteratable object size.
Indeed it calls __getitem___. But list() uses __getitem__ with more indexes than expected.
In my first example the __len__ returns 500. I’d expect list() to call __getitem__ 500 times. But it calls it 1000 times.
Correct. And that is exactly why I am confused. How it even knows that it should call it 5 times (or in fact even 6!).
__getitem__ doesn’t tell how many items are there. list() doesn’t know that self.arr is the iterable it needs to iterate over. I thought that it should check len(dataset) first and then iterate over the length of the dataset. This sounds logical to me.
And I fail to understand how it iterates over more items…
Thanks to @user_123454321 we have an answer.
Indeed, __len__ doesn’t influence how an iterable is converted to a list or iterated over in for loops. In fact, you have to implement and IndexError which signals the end of the iteration. This is also described in the __getitem__ docs
Actually, I have been using this discrepancy in the actual length of the iterable vs what is returned in len for randomly selecting equal number (say 300) of augmented data points for an epoch disregarding the actual number of images using something like
This may work under certain circumstances. But I think the for loops and list(dataset) will be infinite and there is no IndexError, hence, no end to the iteration.
PS. Assuming the implementation is like this act_index = random.randint(0, len(self.images)-1)
Minus 1 is missing.
Yeah, it would go infinite, but I don’t call list on it. And considering the number of images, I think I would crash RAM in the original (non-infinite) case.