I run the resnet18 many times, when input shape is fixed,such as (1, 3, 800, 800), it cost about 2~3ms on RTX4090. But when shape changed each time, the inference cost about 10+ms.
pytorch2.4, cuda11.8, cudnn8.7
I run the resnet18 many times, when input shape is fixed,such as (1, 3, 800, 800), it cost about 2~3ms on RTX4090. But when shape changed each time, the inference cost about 10+ms.
pytorch2.4, cuda11.8, cudnn8.7
Did you make sure to synchronize the code if host timers are used? If so, are you using torch.compile
or torch.backends.cudnn.benchmark = True
?