A map-style dataset in Pytorch has the __getitem__() and __len__() and iterable-style datasets has __iter__() protocol. If we use map-style, we can access the data with dataset[idx] which is great, however with the iterable dataset we can’t.
My question is why this distinction was necessary? What makes the data random read so expensive or even improbable?
I understood the main difference between these datasets, that the IterableDataset provides a clean way to yield data from e.g. a stream, i.e. where the length of the dataset is unknown or cannot be simply calculated. Inside the __iter__ method you would thus have to make sure to exit the iteration at some point, e.g. if your data stream is empty.