I was trying to calculate the inference time for a couple of batches.
Model being used- ResNet18
num_workers = 4
Image Size = 3, 1024, 3072 (Channels, Height, Width)
Here’s how I’m calculating the inference
with torch.no_grad(): for batch_num, data in enumerate(valloader, 0): # get the images and labels and move tensors to GPU inputs = data["image"] labels = data["label"].to(device) labels = labels.long() # For time Sync in Cuda torch.cuda.synchronize() start_time = time.time() output = net(inputs.to(device)) torch.cuda.synchronize() end_time = time.time() elapsed_time += (end_time - start_time) execution_time = elapsed_time * 1000 # Convert to milliseconds avg_inference_time = execution_time / len(valloader) print("Avg Inference Time: ",avg_inference_time)
This is the result I get:
Avg Inference Time: 60.62364959716797 for a batch size of 3
Avg Inference Time: 122.38671875 for a batch size of 6
Avg Inference Time: 103.451 for a batch size of 5
Avg Inference Time: 20.9023842 for a batch size of 1
If things were running parallel, shouldn’t be the time involved for any batch size be the same as long as it fits in the GPU memory. Or is it because my resolution is quite high, the size of the activation layer grows due to which higher batch size takes more time for inference. Could anyone please throw some light on this topic?