I am running a DDP run with 2 GPUs, and at every 10 epochs I evaluate my test data.
For some reason, the first 10 epochs run very slow - until the evaluation process occurs. Can anyone point me to the reason for this?
Attaching visualization of GPU utilization:
My train/test dataloaders use utils.data.distributed.DistributedSampler as samplers
Can you share a minimal reproducible example?
At the moment, my code is a bit too complex to share.
I was hoping maybe someone might be able to point in the right direction.
Solved! sort of…
torch.set_num_threads(1) before starting to train solves the problem. Though I have no idea why.
If anyone has any light to shed on this matter, it would be greatly appreciated!
Thank you! I will look into this.