Hi !!
I am taking a pre trained alexnet model and using it to classify an image. However a strange thing is happening. The time it takes to classify the image on the GPU is coming out to be 800-1000 milliseconds. However when I run this in a loop , then time of image inferencing is almost about 2 milliseconds for each iteration after the first one. The first on still takes the same time. I am using the same image for all. I am doing it something lie this :
for i in range(100):
pre_trained.cpu()
model_transfer_start = time.time()
pre_trained.cuda(0)
torch.cuda.synchronize()
model_transfer_end = time.time()
ans = pre_trained(img_variable)
infer_end = time.time()
img_variable is the preprocessed image Variable.
The results are something like this : (All times are in milliseconds)
I’m not answering your question, but I noticed that you have pre_trained.cpu() in your code. I assume that “pre_trained” is the name of your neural net? If so, I cannot understand why you cast it to cpu() before casting it back into GPU, and, worse, why do you have all this inside a loop? If you want to perform inference on GPU it would make more sense to not cast your model to CPU…
I think that the first 800-1000 ms is due to moving all necessary stuff onto your GPU, setting up buffers etc. (but I’m no specialist).
Actually my motive is to measure the time it takes for the model to transfer to the GPU from the CPU and then inferencing. So basically pre_trained.cpu() is done to ensure that it comes to cpu every time before it is transferred to the GPU since that transfer time i have to measure. Yeaa img_variable is .cuda()
I also think it is because of loading the buffers etc onto the GPU… in my experiment, i discarded the first inference time.
Please feel free to correct if I am wrong