-
Each worker preloads a complete batch. If all start at the same time, they might finish close to each other and in your case 4 batches might be ready. Your actual model workload seems to be small in comparison to the data loading so that the training using these 4 batches finishes quickly. Meanwhile the workers already started to load the new batches, but cannot keep up with the model training so you have clearly a data loading bottleneck in your code.
-
Have a look at this post for a general explanation and some advice.
1 Like