Can I create multiple DataLoaders on one dataset?

If I do so, will iterating of 2 DataLoaders (backed by the same dataset) intervent with each other?

Yes, you can create multiple DataLoaders and could use them. I’m not sure what the concern is, but in case you are using multiple workers in these DataLoaders, note that each loader will create the workers and they will create batches in the background once you start to iterate them, which might or might not be desirable.

Let’s say that I have 2 DataLoaders: dl1 and dl2 backed by the same dataset. While I’m iterating through dl (hasn’t finished yet), I then iterate through dl2 completely. Will they still both behave correctly?

It is essentially a question on how a DataLoader is implemented. Does a DataLoader mutate its underlying dataset? If a DataLoader does not mutate its underlying dataset in anyway but just create randomly shuffle indexes to access the dataset, then multiple DataLoaders backed by the same dataset won’t intervent with each other no matter they are used at the same time or not.

The DataLoader itself will not mutate the Dataset, as it’s calling into the Dataset to get the data, create batches, shuffle etc.
However, the Dataset.__getitem__ could mutate the data in case you are manipulating it inplace (this is usually not wanted and caused errors in the past).
There is also a difference between the behavior of a single worker (main thread) or multiple workers, as the latter will create copies in each worker. So even if you are manipulating the data in the __getitem__ method inplace, these manipulations won’t be stored in the original Dataset.

TL;DR: check the Dataset.__getitem__ and make sure the data is not manipulated inplace.

1 Like

That’s a very good point. In my use case, I don’t leverage multi-threading or multi-processing, so even if Dataset.__getitem__ mutate the data, e.g., creating cache, it won’t affect iterating of 2 DataLoaders on top it (which might iterate in turn).

It’s the other way around. Since you are not using multiple processes, the Dataset will not be copied and manipulations in the Dataset will be visible in all DataLoaders, so be careful about it. :wink: