Strange thing while image inferencing

Hi !!
I am taking a pre trained alexnet model and using it to classify an image. However a strange thing is happening. The time it takes to classify the image on the GPU is coming out to be 800-1000 milliseconds. However when I run this in a loop , then time of image inferencing is almost about 2 milliseconds for each iteration after the first one. The first on still takes the same time. I am using the same image for all. I am doing it something lie this :

for i in range(100):
       model_transfer_start = time.time()
       model_transfer_end = time.time()
       ans = pre_trained(img_variable)
       infer_end = time.time()

img_variable is the preprocessed image Variable.

The results are something like this : (All times are in milliseconds)

Iteration 1: 872.12313
Iteration 2: 2.14568
Iteration 3: 2.43156
Iteration 4: 2.00987
Iteration 5: 2.01218
Iteration 6: 3.01121

and so on…
What am i doing wrong ? And if nothing then why is this such big gap ?

I’m not answering your question, but I noticed that you have pre_trained.cpu() in your code. I assume that “pre_trained” is the name of your neural net? If so, I cannot understand why you cast it to cpu() before casting it back into GPU, and, worse, why do you have all this inside a loop? If you want to perform inference on GPU it would make more sense to not cast your model to CPU…

I think that the first 800-1000 ms is due to moving all necessary stuff onto your GPU, setting up buffers etc. (but I’m no specialist).

Also I assume that img_vairable is .cuda()

Actually my motive is to measure the time it takes for the model to transfer to the GPU from the CPU and then inferencing. So basically pre_trained.cpu() is done to ensure that it comes to cpu every time before it is transferred to the GPU since that transfer time i have to measure. Yeaa img_variable is .cuda() :slight_smile: :slight_smile:

Same thing happens to me, and I am looking for a solution to speed up the first inference.

Does anyone know any solutions?

I also think it is because of loading the buffers etc onto the GPU… in my experiment, i discarded the first inference time.
Please feel free to correct if I am wrong :slight_smile: