ML training time increases after first epoch when using two gpus

I am using U-net with input image size 256, and two gpus. I have set batch size to 32, and used dataparallel in pytorch. The training time for the first epoch is 11 min. However, for the second epoch the time increases to 58 min. Does anyone know why this problem happen?
Thank you for your help.

Could you check if your system might be overheating and could reduce the clocks during the training?

1 Like

Thank you. It may be the reason.