There are two data simulation approaches in my training, one works fast, and one works much slower. Worker id is used to distinguish the two approaches, e.g. there are 10 workers in total, worker 0 to 8 use fast simulation, and worker 9 uses slow simulation.
Since multi-processes are used in DataLoader, it is supposed that the DataLoader and the training process works like the producer-consumer mode: once a data batch is produced by a worker, it is add to a queue. On the other side, the training process get data batches from the queue, and wait if the queue is empty.
However, it is found that the training time is the same as all workers use the slow approach. So, I deduce that workers are not owned by independent subprocesses, but run in a loop, the data simulation is slowed down by the worker 9.
My question is that is there are any way to make the process not stucked by the slow workers? Is there are any parameters I missing? Thank you very much!