If I do so, will iterating of 2 DataLoaders (backed by the same dataset) intervent with each other?
Yes, you can create multiple DataLoader
s and could use them. I’m not sure what the concern is, but in case you are using multiple workers in these DataLoader
s, note that each loader will create the workers and they will create batches in the background once you start to iterate them, which might or might not be desirable.
Let’s say that I have 2 DataLoaders
: dl1
and dl2
backed by the same dataset. While I’m iterating through dl
(hasn’t finished yet), I then iterate through dl2
completely. Will they still both behave correctly?
It is essentially a question on how a DataLoader is implemented. Does a DataLoader mutate its underlying dataset? If a DataLoader does not mutate its underlying dataset in anyway but just create randomly shuffle indexes to access the dataset, then multiple DataLoaders backed by the same dataset won’t intervent with each other no matter they are used at the same time or not.
The DataLoader
itself will not mutate the Dataset
, as it’s calling into the Dataset
to get the data, create batches, shuffle etc.
However, the Dataset.__getitem__
could mutate the data in case you are manipulating it inplace (this is usually not wanted and caused errors in the past).
There is also a difference between the behavior of a single worker (main thread) or multiple workers, as the latter will create copies in each worker. So even if you are manipulating the data in the __getitem__
method inplace, these manipulations won’t be stored in the original Dataset
.
TL;DR: check the Dataset.__getitem__
and make sure the data is not manipulated inplace.
That’s a very good point. In my use case, I don’t leverage multi-threading or multi-processing, so even if Dataset.__getitem__
mutate the data, e.g., creating cache, it won’t affect iterating of 2 DataLoaders on top it (which might iterate in turn).
It’s the other way around. Since you are not using multiple processes, the Dataset
will not be copied and manipulations in the Dataset
will be visible in all DataLoader
s, so be careful about it.