Why is pytorch's result faster?

I compared Pytorch to Pytorch JIT
Before inference, I did warm-up 4 times.

Pytorch load:

models.resnet152(pretrained=True).cuda().eval()

Pytorch JIT load :

torch.jit.load('./resnet152_model.pt').cuda().eval()

then both case are faster.
The reason of result of pytorch JIT is that the optimized graphs are cached along with the bytecode by JIT compiler.
But why is pytorch’s result faster?