Difference in inference time betweeen resnet50 from github and torchvision code

Hello, I’m trying to compare how inference time getting faster by reducing FLOPs through changing input sizes.
However, torchvision resnet 50 does not seems to get faster by size reduction.
So I test on github (address below) resnet 50 that gets faster by smaller input.

What makes the difference?
(Besides time takes almost 4 times bigger in github code on same 300x300 size)

x-axis showing the length of the input, test by Tensor shape (1, 3, x , x)
y-axis showing inference time
The line with orange means a theoretical possible reduction by FLOPs change.
The blue line is experimental num.


  1. github code (with no Fully connected layer and avgpool layer)
  2. torchvsion code (with changing Fully connected layer and avgpool layer to Idenetity layer)

Could you disable cudnn via torch.backends.cudnn.enabled = False and check, if the experimental speed would better fit the theoretical one?

Whole code, directly can run on colab notebook.

import torchvision
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# torch.backends.cudnn.benchmark = False

model = torchvision.models.resnet50(pretrained= False)
model.avgpool = nn.Identity()
model.fc = nn.Identity()
# model = ResNet50()

x_axis = range(50,301, 10)
all_t_resnet =[]
model = model.cuda().eval()
with torch.no_grad():
    for i in x_axis:
        t = []
        for _ in range(100):
            start = torch.cuda.Event(enable_timing=True)
            end = torch.cuda.Event(enable_timing=True) 
            size = [i, i]
            x= torch.randn(1,3,size[0],size[1]).cuda()
            _ = model(x)
            measure_t = start.elapsed_time(end)

theoritic_resnet = (np.array(x_axis)/x_axis[-1])**2*all_t_resnet[-1]

# plt.gca().set_color_cycle(['red', 'green'])

plt.plot(x_axis, all_t_resnet)
plt.plot(x_axis, theoritic_resnet)

plt.legend(['experiment_resnet50', 'theoritical_resnet50'], loc='upper left')


Thank you for replying :grinning:

Yes, I did it, but it does not change the result. Below is the result run on colab.

However, as the input size gets bigger, it seems to converge to the experimental line.

(Post images aer run on RTX 2080ti, while this reply images are on colab Tesla P100)

Thanks for the update!
The plateau might be hit due to a constant overhead in the kernel launches, dispatching mechanism etc.

1 Like

Oh, I see. Thanks.
I will try to find overhead and reason. Thank you. :slightly_smiling_face: