Pytorch abnormal inference time

suke · March 11, 2019, 8:19am

Hi, All
I found a interesting problem during pytorch framework inference. my val set include 100 images(1920x1080), I want to check forward time for every image, so I do preprocess for every image, and insert to list, then I open a loop to test forward time, I found, if I run 25 images once, then it’s time cost same as one image inference time, like below, just 1st time cost much time. next images cost very little time. anybody know what’s the reason?

Time: 0.4107s

Time: 0.0010s

Time: 0.0006s

Time: 0.0007s

Time: 0.0008s

Time: 0.0006s

Time: 0.0007s

Time: 0.0006s

Oli · March 11, 2019, 9:48am

It’s not uncommon for the first pass to take a longer time. Since you have already preprocessed the data that’s not affecting the time in this case. The forward pass has some kind of setup code I’m guessing. It’s a bit weird that 1 image takes the same amount of time as 25. If you run 25 forward passes on separate images that should take a longer time than one forward pass on 25 images because the GPU can take advantage of matrix multiplications which are faster.

Maybe your model is just so tiny so the overhead of starting the inference dominates the time it takes to actually perform the computations?

In conclusion I don’t know why you are getting the results you are getting but it doesn’t seem like a big problem or abnormal to me. This library snakeviz can help you get a bit more insight in your code execution time

MariosOreo · March 11, 2019, 10:36am

Hi,
I have also met similar problem recently, I think this thread can help you.

suke · March 13, 2019, 2:54am

i try torch.cuda.empty_cache() function to resolve my problem,
before status:
1-25: 0.4s
25-50: 2-3s

after add torch.cuda.empty_cache() after 1-25 computing done
1-25: 0.4s
torch.cuda.empty_cache() 2-3s
25-50: 0.4s

even forward time is faster, but empty_cache cost too much time
i guess maybe 1-25 has cover all memory of gpu, so without empty_cache(), gpu will check and free memory for new forward. if i use empty_cache, gpu free all memory. my purpose is make it more faster. anyone know how to resolve it?

Oli · March 13, 2019, 6:47am

Wait what? Did something change from your first post? Are you now saying that the first 1-25 batches take 0.4s and the batches 25-50 take 2-3 seconds? So it’s slower as you go now? What happens at 50-75 batches, or 75-100?

How many batches does your dataset consist of? 100 validation images with a batch size of 1 = 100 batches?

MariosOreo · March 13, 2019, 6:48am

Could you post your snippet let us reproduce your problem, if so it will be very help!