why is inference speed of the same model with different size input tensor so unreasonable?

my device: win10 1080ti i7-8700K
my model: yolov5s
I test inference speed with two different input size called separately bigsize:1X3X800X1376 and smallsize:1X3X416X736 in gpu and cpu mode. Test result is as follows.
compute device ------------------ inputsize ------------------------ FPS
1080ti-------------------------- bigsize:1X3X800X1376-------------- 49
1080ti-------------------------- smallsize:1X3X416X736------------- 43
i7-8700K-----------------------bigsize:1X3X800X1376---------------1.3
i7-8700K-----------------------smallsize:1X3X416X736---------------5.6
When i record the timestamp, I use this function
def time_synchronized(): torch.cuda.synchronize() if torch.cuda.is_available() else None return time.time()

I can not understand why big input size get slower FPS. Thank you for your reply.