Traning is slow

I am running my code on the server (it has 7 GPUs), initially when I used to run my code. It takes around 40 sec to for total execution

python file.py

But, since from 2 days, same code is taking more time (similar to CPU), even I tried allocating different GPUs(cuda:0 and cuda:1). but still slow.

what would be the issue?

Did you change anything in the last 2 days, e.g. reinstalled drivers, libraries, etc.?
If you haven’t changed anything and the code is suddenly slow, I would guess your setup is in a “bad” state. E.g. are you using a network drive and could the network have issues?
To narrow down the root cause you could profile separate code snippets and check, which part of the training is slow.

No, I did not change code, I do not have a network issue, maybe a setup problem.
Thanks for your detailed info