Hello,
When building pytorch from source (no cuda, no distributed, cpu only), i noticed a big difference in performance from the version i installed via pip (http://download.pytorch.org/whl/cpu/torch-0.4.0-cp27-cp27mu-linux_x86_64.whl ).
After some trials, i managed to get similar performances by linking against mkl lib (installing mkl using pip and then building again).
I would like to know if there is a link to the build script used to create the cpu-only wheel version, or at least which flags/libs are used ?
Example output of the profiler for the different versions (pip, source, source+mkl)
=== Pytorch 0.4.0 (cpu only) pip - python 2.7 ===
Timer unit: 1e-06 s
Total time: 4.231 s
File: extension_test.py
Function: forward at line 36
Line # Hits Time Per Hit % Time Line Contents
==============================================================
36 @profile
37 def forward(self, x):
38 5001 3640815.0 728.0 86.1 x = self.conv_layers(x)
39 # print(x.shape)
40 5001 36261.0 7.3 0.9 x = x.view(x.size(0), -1)
41 5001 550743.0 110.1 13.0 x = self.fc(x)
42 5001 3182.0 0.6 0.1 return x
=== Built from source v0.5.0a0+e6f7e18 (master branch) python 2.7 ===
Timer unit: 1e-06 s
Total time: 8.65896 s
File: extension_test.py
Function: forward at line 36
Line # Hits Time Per Hit % Time Line Contents
==============================================================
36 @profile
37 def forward(self, x):
38 5001 7399796.0 1479.7 85.5 x = self.conv_layers(x)
39 # print(x.shape)
40 5001 62640.0 12.5 0.7 x = x.view(x.size(0), -1)
41 5001 1191392.0 238.2 13.8 x = self.fc(x)
42 5001 5129.0 1.0 0.1 return x
=== From source with mkl support python 2.7 ==
Timer unit: 1e-06 s
Total time: 4.33177 s
File: extension_test.py
Function: forward at line 36
Line # Hits Time Per Hit % Time Line Contents
==============================================================
36 @profile
37 def forward(self, x):
38 5001 3731289.0 746.1 86.1 x = self.conv_layers(x)
39 # print(x.shape)
40 5001 37394.0 7.5 0.9 x = x.view(x.size(0), -1)
41 5001 560042.0 112.0 12.9 x = self.fc(x)
42 5001 3047.0 0.6 0.1 return x