I notice a strange behavior when training.
If for instance, I have a batch size of N=(1, 2, 3, …) and 8 workers (or more),
I notice that during training, the loop pauses for a few seconds every 8th iterations. As if that pause is somehow used to load data or something of the like.
This behavior appears on a GTX 1080TI GPU.
The same code when executed on a TITAN RTX does not exhibit the above behavior.
Has someone else encountered this issue? could this be an issue with my computer? @ptrblck
The issue sounds indeed like a data loading bottleneck similar to this one so you could profile your code to isolate the bottleneck further.
As a quick test, you could replace the data loading with a single random tensor and let the model train on it. If the periodic slowdown is gone, it would point towards the data loading.