_DataLoaderIter vs _BaseDataLoaderIter

Hi all,

I am confused about the Iterator class of DataLoader. In particular I wanted to ask if the implementation has fundamentally changed between some of the pytorch versions?
Since, in the online documentation


I can only find the classes _BaseDataLoaderIter(object) and its subclasses _SingleProcessDataLoaderIter(_BaseDataLoaderIter) and _MultiProcessingDataLoaderIter(_BaseDataLoaderIter). However, when I look at the anaconda code on my PC in anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py
these three classes do not exist and instead there is only a class called _DataLoaderIter(object) (which seems somewhat similar to the implementation of _MultiProcessingDataLoaderIter, but they re not exactly the same).
Why cant I find the code of _DataLoaderIter(object) in the documentation? Does it have to do with different pytorch versions? If so, what consequences does that have if I use custom dataset, sampler and collate_fn functions? Will they work for either pytorch version?

1 Like

Hi,

All the classes that start with an underscore like _Foo are internal and so are not documented and can change between versions without notice.
The latest big change there I can think of is: https://github.com/pytorch/pytorch/pull/19228
Which version of pytorch do you currently have installed?

2 Likes

Hi
thank you for your reply!
I am currently using Version ‘1.0.1.post2’.
In your link indeed is explained that DataLoaderIter was split up into the two classes I have mentioned above. Does this mean that if I now implement a custom collate_fn, sampler and dataset, that these might not work on a newer pytorch version anymore?

It should.
Your dataset should only touch the Dataset and Sampler classes. Not the _* ones anyway right?

Well I try to make the getitem method expecting two indices as parameters such that I can select data from a 3D tensor. To do so, I at least have to rewrite the collate_fn method, too. Maybe even more, but I havent got so far yet.

Can you linearize your two indices as a larger 1D tensor? That way you can use the base loader.

Only implementation details changed so custom dataset, sampler, and collate_fn that use public APIs will work as is.

1 Like

Thanks a lot for your replies both of you.
I have just realized that in both implementations, the getitem method is always assumed to only take one argument. Since in __DataLoaderIter class there is line 615:
batch = self.collate_fn([self.dataset[i] for i in indices])
and in the other case, when _MapDatasetFetcher is used, there is the line:
data = [self.dataset[idx] for idx in possibly_batched_index]
Thus, both implementations demand that the getitem method necessarily only takes one argument. And since both of the above are internal methods I guess I should not be changing.
So is there really no option that I adjust the getitem method to accept two indices and hence make use of my 3D dataset?