Why inference time increases after few iteration?

Hi I faced the problem of increasing inference time after few iteration, specifically when the batch size is large.

import time
import torch
a=torch.randn((1600,3,model.visual.input_resolution,model.visual.input_resolution)).to(device=device)
st=time.time()
for i in range(20):
    st=time.time()
    with torch.no_grad():
        model.encode_image(a)
    print(time.time()-st)

When I ran this code, the inference time is printed out as below

0.0164792537689209
0.015082120895385742
0.24733448028564453
0.4809727668762207
0.48241543769836426
0.4811861515045166
0.4822506904602051
0.48279619216918945
0.4839320182800293
0.48270678520202637
0.4844379425048828
0.4843134880065918
0.48419904708862305
0.48407626152038574
0.4829099178314209
0.48367929458618164
0.48392295837402344
0.4842257499694824
0.4861021041870117
0.4838073253631592

But it happens when the batch size (in my case 1600) is large. The memory usage seems not heavy, but I wonder why the inference time suddenly increases. In fact, it seems normal to take 0.48 seconds for batch size 1600, but why did it become so quickly in the beginning?

If you’re running this on a GPU you need to call torch.cuda.synchronize() before each call of time.time() as GPU calls are asynchronous. Add torch.cuda.synchronize() and measure the time again and see if the time changes?