Hi I faced the problem of increasing inference time after few iteration, specifically when the batch size is large.
import time
import torch
a=torch.randn((1600,3,model.visual.input_resolution,model.visual.input_resolution)).to(device=device)
st=time.time()
for i in range(20):
st=time.time()
with torch.no_grad():
model.encode_image(a)
print(time.time()-st)
When I ran this code, the inference time is printed out as below
0.0164792537689209
0.015082120895385742
0.24733448028564453
0.4809727668762207
0.48241543769836426
0.4811861515045166
0.4822506904602051
0.48279619216918945
0.4839320182800293
0.48270678520202637
0.4844379425048828
0.4843134880065918
0.48419904708862305
0.48407626152038574
0.4829099178314209
0.48367929458618164
0.48392295837402344
0.4842257499694824
0.4861021041870117
0.4838073253631592
But it happens when the batch size (in my case 1600) is large. The memory usage seems not heavy, but I wonder why the inference time suddenly increases. In fact, it seems normal to take 0.48 seconds for batch size 1600, but why did it become so quickly in the beginning?