Runing speeds of a model with difficult weights are quite different,

I trained a model twice. As I tested the two best models, the FPS are around 8 and 2 respectively. Nothing else was changed except the weights of the model. How could this happen?

Could you explain your profiling and use case a bit?
Also, are you using the same input shapes?
Is the model using the CPU or GPU? In the former case, you might see a performance hit due to the processing of denormal values, which is slower on the CPU. In that case, try to set torch.set_flush_denormal(True) and rerun the code. If you are using the GPU, make sure to synchronize the code properly via torch.cuda.synchronize() before starting and stopping the timer.