I am using U-net with input image size 256, and two gpus. I have set batch size to 32, and used dataparallel in pytorch. The training time for the first epoch is 11 min. However, for the second epoch the time increases to 58 min. Does anyone know why this problem happen?
Thank you for your help.
Could you check if your system might be overheating and could reduce the clocks during the training?
1 Like
Thank you. It may be the reason.