Epochs show large difference in running time

Hi,
When I ran my pytorch model in HPC. I used 4 CPU cores, batch size =1.
The running time of each epoch ranges from 200-4000s. I tested 35 jobs. And only 5 jobs shows shuch results. In other jobs, each epoch only run in 80-90s. Would anyone to give some suggestions on such results?

Here is some logs:
epoch:0; Training Loss:[0.07358009 0. 0. 0. 0. ]
474.0903596878052
epoch:1; Training Loss:[0.06944189 0. 0. 0. 0. ]
603.3289322853088
epoch:2; Training Loss:[0.06749273 0. 0. 0. 0. ]
4037.6397545337677
epoch:3; Training Loss:[0.06656914 0. 0. 0. 0. ]
900.0050981044769
epoch:4; Training Loss:[0.06594099 0. 0. 0. 0. ]
414.7940049171448
epoch:5; Training Loss:[0.0652342 0. 0. 0. 0. ]
1492.0406074523926
epoch:6; Training Loss:[0.06477792 0. 0. 0. 0. ]
1997.3091661930084
epoch:7; Training Loss:[0.06439936 0. 0. 0. 0. ]
1552.9386072158813
epoch:8; Training Loss:[0.06404987 0. 0. 0. 0. ]
1519.947157382965
epoch:9; Training Loss:[0.06379405 0. 0. 0. 0. ]
1737.3557515144348
epoch:10; Training Loss:[0.06337144 0. 0. 0. 0. ]
323.754665851593
epoch:11; Training Loss:[0.0630567 0. 0. 0. 0. ]
220.557555437088

Thank you for help.

Hey!
There can be many many reasons for this from cpu usage, to slow disk, or remote disks even depending on your cluster.
I would suggest profiling your training loop to see which part causes this slow down and go from there!

Thank you. I will do and check which part of the code cost too many time.