Why is pytorch's GPU utilization so low in production ( NOT training )?

Hello, I just test in inference process, if I have 25 images need be handled, it seems 1st time cost some time, then next 24 images cost very little time. but the interesting thing is if I have 26 images then the 26 images will cost same time with 1st image. could you help to explain that?

1st image Time: 0.4107s

2nd -25nd images Time: 0.0010s

26 images Time: 0.4206s