In DataLoader, is it possible to control which CPU workers are assigned which indices?

Let’s say I have 5 CPUs, a dataset of length 100, and batch size of 8. Let’s say the DataLoader assigns 8 indices in [0, 99]. For example: {1, 2, 33, 55, 61, 62, 77, 78}.

Now I want to be able to control which CPU workers are assigned to fetch the corresponding data. I want CPU 1 to fetch indices in [0, 19], CPU 2 to fetch indices in [20, 39], CPU 3 to fetch indices in [40, 59], CPU 4 to fetch indices in [60, 79], and CPU 5 to fetch indices in [80, 99].

Consequently, CPU 1 should be assigned to fetch {1, 2}, CPU 2 should be assigned to fetch {33}, CPU 3 should be assigned to fetch {55}, CPU 4 should be assigned to fetch {61, 62, 77, 78}, and CPU 5 should not be assigned to fetch any indices.

You can use [get_worker_info()](https://pytorch.org/docs/stable/data.html#torch.utils.data.get_worker_info) and worker_init_fn() that is being passed to DataLoader to configure the worker to only read certain fraction of the Dataset. For example, you can modify the Dataset object at each worker or you can make __getitem__ to ignore certain indices based on worker_id.

Would those discarded indices still be retrieved by other workers or no?

Each worker should have an independent sampler that generates all the indices, but you can decide what to do with them at each worker.

Okay, that was the solution I thought of