I’ve got torch 0.4.1 on python3.5. A paper I’m trying to reproduce claims they have a 13ms execution time for a model based on ResNet50 on GTX 1080 Ti using Caffe.
I’ve been able to translate the exact network to PyTorch using MMdnn, however the execution of the ResNet part alone takes between 29 and 39ms alone on my end, using a GTX 1080 and the entire network takes between 35 and 54ms (I’m suprised it varies so much between subsequent executions, is that normal in PyTorch?). I’ve tried to look at the torchvision’s resnet50 model for comparison, but the execution time is even worse: 53ms for the stub I have in common with my network and 58ms for all ResNet50.
I understand that a GTX 1080 Ti is better than a 1080, but still the difference is too large. Unfortunately, I haven’t been able to run the network on Caffe on my machine for comparison, as Caffe is hell to compile (I need a custom layer).
Here is my code for reproduction:
import numpy as np import torch from timeit import default_timer as timer from torchvision.models import resnet50 def main(): # Define model and input data resnet = resnet50().cuda() x = torch.from_numpy(np.random.rand(1, 3, 224, 224).astype(np.float32)).cuda() # Entire network # x = torch.from_numpy(np.random.rand(1, 64, 32, 32).astype(np.float32)).cuda() # Stub alone # The first pass is always slower, so run it once resnet.forward(x) # Measure elapsed time passes = 20 total_time = 0 for _ in range(passes): start = timer() resnet.forward(x) delta = timer() - start print('Forward pass: %.3fs' % delta) total_time += delta print('Average forward pass: %.3fs' % (total_time / passes)) if __name__ == '__main__': main()
When I refer to the stub, it means I commented out the following lines in torchvision/models/resnet.py:
def forward(self, x): # x = self.conv1(x) # x = self.bn1(x) # x = self.relu(x) # x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) # x = self.avgpool(x) # x = x.view(x.size(0), -1) # x = self.fc(x) return x