By running the same code on cloud giving difference in inference time of 10 seconds for 1000 images. What may be the reason?
Everything could be related, starting from the hardware, driver, math libs, PyTorch release, to your actual code.
Profile your workload with the native PyTorch profiler or e.g. Nsight Systems and compare both profiles from theses systems.