I have a question related to a relation between num_workers in DataLoader to batch_size and Epoch number.

E.g.

- Lets assume we have total training size = 2000, and we use batch_size of 20 and we use 10 num_workers in DataLoader. Does this mean that DataLoader will return 20*10 = 200 examples at parallel which consist of 20 examples from each of the individual DataLoader’s workers ?

Or

- DataLoader returns 20 examples sequentially from each worker and finish 10 epochs at parallel after 10(number of workers) * 100 (iterations to complete one epoch) = 1000 iterations

Or

- Something else ? Kindly explain.

If first one is correct than while calculating the loss function does it aggregate across those 200 examples?

If second one is correct than it means that the loss will be calculated for 20 examples for 1st worker and than 2nd worker and than 3rd and so on until the end. My question here is that if it is calculating loss for 20 examples from each of the worker before proceeding onto the next batch, would there be any concern that model can end up in some totally different minima after convergence than it would have converged when we just use single worker ? Because obviously model is taking different steps towards minima in batch gradient descent when we use more number of workers as compared to single worker.

If my understanding wrong, I would be extremely thankful if someone clarifies.