Hello,
I ran the PyTorch profiler to measure the total CPU times for the following models:
Model name | CPU total time |
---|---|
ResNet18 | 37.236ms |
ProxylessNAS (CPU) | 69.561ms |
MobileNetV2 | 64.838ms |
The machine has 14-cores Intel Core-i9 10940X CPU @ 3.30GHz.
Both ProxylessNAS (CPU) and MobileNetV2 are slower than ResNet18.
But I was expecting the opposite.
The script is as follows. Do I miss anything?
import numpy as np
import torch
from torch.autograd import profiler
#model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True)
#model = torch.hub.load('mit-han-lab/ProxylessNAS', 'proxyless_cpu', pretrained=True)
model = torch.hub.load('pytorch/vision:v0.6.0', 'mobilenet_v2', pretrained=True) #models.mobilenetv2()
inputs = torch.randn((1, 3, 224, 224))
with torch.no_grad():
with profiler.profile(record_shapes=True) as prof:
with profiler.record_function('model_inference'):
model(inputs)
print(prof.key_averages().table(sort_by='cpu_time_total', row_limit=10))
The detailed outputs are as follows:
ResNet18
--------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls
--------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
model_inference 6.02% 2.241ms 100.00% 37.236ms 37.236ms 1
conv2d 0.17% 63.016us 46.30% 17.240ms 861.987us 20
convolution 0.20% 75.310us 46.13% 17.177ms 858.836us 20
_convolution 0.69% 256.361us 45.93% 17.101ms 855.071us 20
mkldnn_convolution 44.84% 16.696ms 45.14% 16.810ms 840.478us 20
batch_norm 0.21% 78.296us 31.22% 11.626ms 581.318us 20
_batch_norm_impl_index 0.22% 80.086us 31.01% 11.548ms 577.404us 20
native_batch_norm 18.85% 7.018ms 30.71% 11.436ms 571.781us 20
max_pool2d 0.05% 17.052us 12.65% 4.709ms 4.709ms 1
max_pool2d_with_indices 12.58% 4.684ms 12.60% 4.692ms 4.692ms 1
--------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 37.236ms
ProxylessNAS (CPU)
-------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls
-------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
model_inference 8.69% 6.046ms 100.00% 69.561ms 69.561ms 1
conv2d 0.20% 140.364us 44.30% 30.818ms 505.213us 61
convolution 0.20% 137.516us 44.10% 30.678ms 502.912us 61
_convolution 1.09% 754.904us 43.90% 30.540ms 500.657us 61
batch_norm 0.29% 203.942us 43.69% 30.394ms 498.257us 61
_batch_norm_impl_index 0.31% 214.853us 43.40% 30.190ms 494.914us 61
native_batch_norm 23.48% 16.333ms 42.96% 29.885ms 489.923us 61
mkldnn_convolution 42.20% 29.355ms 42.66% 29.678ms 486.520us 61
select 13.17% 9.160ms 17.69% 12.303ms 3.191us 3855
as_strided 3.23% 2.247ms 3.23% 2.247ms 0.582us 3858
-------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 69.561ms
MobileNetV2
-------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls
-------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
model_inference 7.19% 4.662ms 100.00% 64.838ms 64.838ms 1
batch_norm 0.27% 174.627us 50.49% 32.736ms 629.534us 52
_batch_norm_impl_index 0.29% 186.420us 50.22% 32.561ms 626.176us 52
native_batch_norm 27.76% 18.001ms 49.82% 32.301ms 621.170us 52
conv2d 0.18% 118.212us 38.70% 25.091ms 482.519us 52
convolution 0.22% 144.360us 38.52% 24.973ms 480.245us 52
_convolution 1.23% 800.032us 38.29% 24.828ms 477.469us 52
mkldnn_convolution 36.46% 23.642ms 36.91% 23.934ms 460.270us 52
select 15.08% 9.777ms 20.58% 13.342ms 3.572us 3735
as_strided 3.94% 2.553ms 3.94% 2.553ms 0.683us 3738
-------------------------- --------------- --------------- --------------- --------------- --------------- ---------------
Self CPU time total: 64.838ms