Rebuild libtorch on ubuntu 16.04 and slower than official library

Hi,

I had rebuild libtorch C++ library(v1.0.1) based on /tools/build_libtorch.py file, unfortunately they are slower than official version.
The following is my configuration output. Are there any tricks? Thanks your enthusiasm.
:grinning:

Have you found out the reason?
I installed pytorch from source on win10, however the speed of loading model and predicting, libtorch is 3 times slower than that of caffe. I don’t know why. I’m much confused!

– TORCH_VERSION : 1.1.0
– CAFFE2_VERSION : 1.1.0
– BUILD_ATEN_MOBILE : OFF
– BUILD_ATEN_ONLY : OFF
– BUILD_BINARY : False
– BUILD_CUSTOM_PROTOBUF : ON
– Link local protobuf : ON
– BUILD_DOCS : OFF
– BUILD_PYTHON : True
– Python version : 3.6.6
– Python executable : C:/Users/qjhs/AppData/Local/Programs/Python/Python36/python.exe
– Pythonlibs version : 3.6.6
– Python library : C:/Users/qjhs/AppData/Local/Programs/Python/Python36/libs/python36.lib
– Python includes : C:/Users/qjhs/AppData/Local/Programs/Python/Python36/include
– Python site-packages: Lib/site-packages
– BUILD_CAFFE2_OPS : True
– BUILD_SHARED_LIBS : ON
– BUILD_TEST : True
– USE_ASAN : OFF
– USE_CUDA : True
– CUDA static link : False
– USE_CUDNN : ON
– CUDA version : 10.0
– cuDNN version : 7.5.0
– CUDA root directory : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0
– CUDA library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/cuda.lib
– cudart library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/cudart.lib
– cublas library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/cublas.lib
– cufft library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/cufft.lib
– curand library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/curand.lib
– cuDNN library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/cudnn.lib
– nvrtc : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/lib/x64/nvrtc.lib
– CUDA include path : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/include
– NVCC executable : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0/bin/nvcc.exe
– CUDA host compiler : F:/ProgramFiles/VS2017L/VC/Tools/MSVC/14.11.25503/bin/HostX64/x64/cl.exe
– USE_TENSORRT : OFF
– USE_ROCM : False
– USE_EIGEN_FOR_BLAS :
– USE_FBGEMM : OFF
– USE_FFMPEG : False
– USE_GFLAGS : OFF
– USE_GLOG : OFF
– USE_LEVELDB : False
– USE_LITE_PROTO : OFF
– USE_LMDB : False
– USE_METAL : OFF
– USE_MKL : ON
– USE_MKLDNN : OFF
– USE_NCCL : False
– USE_NNPACK : OFF
– USE_NUMPY : ON
– USE_OBSERVERS : ON
– USE_OPENCL : OFF
– USE_OPENCV : False
– USE_OPENMP : ON
– USE_PROF : OFF
– USE_QNNPACK : OFF
– USE_REDIS : OFF
– USE_ROCKSDB : OFF
– USE_ZMQ : OFF
– USE_DISTRIBUTED : False