To me, after some practicality checks, the following worked smoothly:
num_workers
attribute in torch.utils.data.DataLoader should be set to 4 * num_GPU, 8
or 16
should generally be good:
-
num_workers=1
, up to 50% GPU, for 1 epoch (106s ovre 1/10 epochs),
training completes in43m 24s
-
num_workers=32
, up to 90% GPU, for 1 epoch (42s over 1/14 epochs),
training completes in11m 9s
-
num_workers=8
, 16, up to 90% GPU, (8 is slightly better) for 1 epoch (40.5s over 1/16 epochs),
training completes in10m 52s
-
num_workers=0
, up to 40% GPU, for 1 epoch (129s over 1/16 epochs),
training complete in34m 32s