Hi.
I’m doing the inference of VGG16 and MobileNet V3 small models on Google Colab using NVIDIA Tesla T4 GPU, with PyTorch 2.2.1 framework and cuDnn 8906. When measuring the average inference time per image using the code below, I got the follwing results:
VGG16 : 14.38 ms per images
MobileNet V3 Small : 17.39 ms per images
It can be seen that VGG16 is faster than MobileNet V3 small even thogh MobileNet is more efficient and has much less GFLOPS than VGG (0.06 for MobileNet and 15.47 for VGG16).
However when the inference is done on CPU, MobileNet is much faster than VGG16 with 24ms and 906 ms per image respectively.
How could you explain this ?
The used Batch size is 1
dummy_input = torch.randn(1, 3,256,256, dtype=torch.float).to(device)
starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
timings=np.zeros((len(test_datasets),1))
#GPU-WARM-UP
for _ in range(10):
_ = model(dummy_input)
# MEASURE PERFORMANCE
loop = tqdm(test_dataloader) # Progress bar
inference_time = 0
rep = 0
for inputs, labels, paths in loop:
inputs = inputs.to(device)
labels = labels.to(device)
starter.record()
outputs = model(inputs)
ender.record()
# WAIT FOR GPU SYNC
# torch.cuda.synchronize()
curr_time = starter.elapsed_time(ender)
timings[rep] = curr_time
rep = rep + 1
mean_syn = np.sum(timings) / len(test_datasets)
std_syn = np.std(timings)
print('\nAverage time per image : ', mean_syn)
print('Total inference time : ', sum(timings))