Inference cost more time when input shape changed

I run the resnet18 many times, when input shape is fixed,such as (1, 3, 800, 800), it cost about 2~3ms on RTX4090. But when shape changed each time, the inference cost about 10+ms.

pytorch2.4, cuda11.8, cudnn8.7

Did you make sure to synchronize the code if host timers are used? If so, are you using torch.compile or torch.backends.cudnn.benchmark = True?