Hello,
Currently when downloading libtorch 1.8.1 (libtorch-win-shared-with-deps-1.8.1+cpu.zip), torch_cpu.dll is about 200 MB which is quite huge.
We don’t need Caffe2 nor anything to create a model from the c++ side, we just want to do the inference. Basically the code is :
torch::jit::script::Module l_module = torch::jit::load( "TheModelPath" );
...
at::Tensor output = l_module.forward( l_inputs ).toTensor();
And that’s all.
I have built the 1.8.1 from source on Visual Studio 2019 without Caffe2, the new dll is about 68 MB which is much better, but the problem is that when we run the inference with that version the inference is about 3 to 4 times slower.
I’m probably missing some options, can somebody help me on creating a lightweight dll that is still sharp and fast as the original one coming from the website ?
Here is cmake output after the configuration :
CMake Warning at CMakeLists.txt:273 (message):
TensorPipe cannot be used on Windows. Set it to OFF
std::exception_ptr is supported.
Current compiler supports avx2 extension. Will build perfkernels.
Current compiler supports avx512f extension. Will build fbgemm.
Building using own protobuf under third_party per request.
Use custom protobuf build.
3.11.4.0
Caffe2 protobuf include directory: $<BUILD_INTERFACE:C:/Dev/pytorch-1.8/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
Trying to find preferred BLAS backend of choice: MKL
MKL_THREADING = OMP
MKL_THREADING = OMP
CMake Warning at cmake/Dependencies.cmake:152 (message):
MKL could not be found. Defaulting to Eigen
Call Stack (most recent call first):
CMakeLists.txt:564 (include)
CMake Warning at cmake/Dependencies.cmake:175 (message):
Preferred BLAS (MKL) cannot be found, now searching for a general BLAS
library
Call Stack (most recent call first):
CMakeLists.txt:564 (include)
MKL_THREADING = OMP
Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_intel_thread - mkl_core]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_sequential - mkl_core]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_core - libiomp5md - pthread]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_core - pthread]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_core - pthread]
Library mkl_intel: not found
Checking for [mkl - guide - pthread - m]
Library mkl: not found
MKL library not found
Checking for [Accelerate]
Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
Checking for [vecLib]
Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
Could not find OpenBLAS include. Turning OpenBLAS_FOUND off
Could not find OpenBLAS lib. Turning OpenBLAS_FOUND off
Checking for [openblas]
Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
Checking for [openblas - pthread - m]
Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
Checking for [libopenblas]
Library libopenblas: BLAS_libopenblas_LIBRARY-NOTFOUND
Checking for [goto2 - gfortran]
Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
Checking for [goto2 - gfortran - pthread]
Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
Checking for [acml - gfortran]
Library acml: BLAS_acml_LIBRARY-NOTFOUND
Checking for [blis]
Library blis: BLAS_blis_LIBRARY-NOTFOUND
Could NOT find Atlas (missing: Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY)
Checking for [ptf77blas - atlas - gfortran]
Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND
Checking for [blas]
Library blas: BLAS_blas_LIBRARY-NOTFOUND
Cannot find a library with BLAS API. Not using BLAS.
CMake Warning (dev) at C:/Program Files/CMake/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (OpenMP_C)
does not match the name of the calling package (OpenMP). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/Modules/FindOpenMP.cmake:565 (find_package_handle_standard_args)
third_party/fbgemm/CMakeLists.txt:59 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at C:/Program Files/CMake/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (OpenMP_CXX)
does not match the name of the calling package (OpenMP). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/Modules/FindOpenMP.cmake:565 (find_package_handle_standard_args)
third_party/fbgemm/CMakeLists.txt:59 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning at third_party/fbgemm/CMakeLists.txt:61 (message):
OpenMP found! OpenMP_C_INCLUDE_DIRS =
CMake Warning at third_party/fbgemm/CMakeLists.txt:136 (message):
==========
CMake Warning at third_party/fbgemm/CMakeLists.txt:137 (message):
CMAKE_BUILD_TYPE = Release
CMake Warning at third_party/fbgemm/CMakeLists.txt:138 (message):
CMAKE_CXX_FLAGS_DEBUG is /MDd /Z7 /Ob0 /Od /RTC1 /w /MP /bigobj
CMake Warning at third_party/fbgemm/CMakeLists.txt:139 (message):
CMAKE_CXX_FLAGS_RELEASE is /MD /O2 /Ob2 /DNDEBUG /w /MP /bigobj
CMake Warning at third_party/fbgemm/CMakeLists.txt:140 (message):
==========
** AsmJit Summary **
ASMJIT_DIR=C:/Dev/pytorch-1.8/pytorch/third_party/fbgemm/third_party/asmjit
ASMJIT_TEST=OFF
ASMJIT_TARGET_TYPE=SHARED
ASMJIT_DEPS=
ASMJIT_LIBS=asmjit
ASMJIT_CFLAGS=
ASMJIT_PRIVATE_CFLAGS=-MP;-GR-;-GF;-Zc:inline;-Zc:strictStrings;-Zc:threadSafeInit-;-W4
ASMJIT_PRIVATE_CFLAGS_DBG=-GS
ASMJIT_PRIVATE_CFLAGS_REL=-GS-;-O2;-Oi
Using third party subdirectory Eigen.
Using third_party/pybind11.
pybind11 include dirs: C:/Dev/pytorch-1.8/pytorch/cmake/../third_party/pybind11/include
Adding OpenMP CXX_FLAGS: -openmp:experimental
No OpenMP library needs to be linked against
Generated: C:/Dev/pytorch-1.8/pytorch/build/third_party/onnx/onnx/onnx_onnx_torch-ml.proto
Generated: C:/Dev/pytorch-1.8/pytorch/build/third_party/onnx/onnx/onnx-operators_onnx_torch-ml.proto
Generated: C:/Dev/pytorch-1.8/pytorch/build/third_party/onnx/onnx/onnx-data_onnx_torch.proto
******** Summary ********
CMake version : 3.18.4
CMake command : C:/Program Files/CMake/bin/cmake.exe
System : Windows
C++ compiler : C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29910/bin/Hostx64/x64/cl.exe
C++ compiler version : 19.28.29913.0
CXX flags : /DWIN32 /D_WINDOWS /GR /EHsc /w /MP /bigobj -DUSE_PTHREADPOOL -openmp:experimental
Build type : Release
Compile definitions : WIN32_LEAN_AND_MEAN;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
CMAKE_PREFIX_PATH :
CMAKE_INSTALL_PREFIX : C:/Program Files (x86)/Torch
CMAKE_MODULE_PATH : C:/Dev/pytorch-1.8/pytorch/cmake/Modules
ONNX version : 1.8.0
ONNX NAMESPACE : onnx_torch
ONNX_BUILD_TESTS : OFF
ONNX_BUILD_BENCHMARKS : OFF
ONNX_USE_LITE_PROTO : OFF
ONNXIFI_DUMMY_BACKEND : OFF
ONNXIFI_ENABLE_EXT : OFF
Protobuf compiler :
Protobuf includes :
Protobuf libraries :
BUILD_ONNX_PYTHON : OFF
******** Summary ********
CMake version : 3.18.4
CMake command : C:/Program Files/CMake/bin/cmake.exe
System : Windows
C++ compiler : C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29910/bin/Hostx64/x64/cl.exe
C++ compiler version : 19.28.29913.0
CXX flags : /DWIN32 /D_WINDOWS /GR /EHsc /w /MP /bigobj -DUSE_PTHREADPOOL -openmp:experimental
Build type : Release
Compile definitions : WIN32_LEAN_AND_MEAN;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
CMAKE_PREFIX_PATH :
CMAKE_INSTALL_PREFIX : C:/Program Files (x86)/Torch
CMAKE_MODULE_PATH : C:/Dev/pytorch-1.8/pytorch/cmake/Modules
ONNX version : 1.4.1
ONNX NAMESPACE : onnx_torch
ONNX_BUILD_TESTS : OFF
ONNX_BUILD_BENCHMARKS : OFF
ONNX_USE_LITE_PROTO : OFF
ONNXIFI_DUMMY_BACKEND : OFF
Protobuf compiler :
Protobuf includes :
Protobuf libraries :
BUILD_ONNX_PYTHON : OFF
Could not find CUDA with FP16 support, compiling without torch.CudaHalfTensor
Adding -DNDEBUG to compile flags
MAGMA not found. Compiling without MAGMA support
Could not find hardware support for NEON on this machine.
No OMAP3 processor on this machine.
No OMAP4 processor on this machine.
AVX compiler support found
AVX2 compiler support found
MKL_THREADING = OMP
Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_intel_thread - mkl_core]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_sequential - mkl_core]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_core - libiomp5md - pthread]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_core - pthread]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_core - pthread]
Library mkl_intel: not found
Checking for [mkl - guide - pthread - m]
Library mkl: not found
MKL library not found
Checking for [Accelerate]
Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
Checking for [vecLib]
Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
Could not find OpenBLAS include. Turning OpenBLAS_FOUND off
Could not find OpenBLAS lib. Turning OpenBLAS_FOUND off
Checking for [openblas]
Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
Checking for [openblas - pthread - m]
Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
Checking for [libopenblas]
Library libopenblas: BLAS_libopenblas_LIBRARY-NOTFOUND
Checking for [goto2 - gfortran]
Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
Checking for [goto2 - gfortran - pthread]
Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
Checking for [acml - gfortran]
Library acml: BLAS_acml_LIBRARY-NOTFOUND
Checking for [blis]
Library blis: BLAS_blis_LIBRARY-NOTFOUND
Could NOT find Atlas (missing: Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY)
Checking for [ptf77blas - atlas - gfortran]
Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND
Checking for [blas]
Library blas: BLAS_blas_LIBRARY-NOTFOUND
Cannot find a library with BLAS API. Not using BLAS.
LAPACK requires BLAS
Cannot find a library with LAPACK API. Not using LAPACK.
disabling CUDA because NOT USE_CUDA is set
USE_CUDNN is set to 0. Compiling without cuDNN support
disabling ROCM because NOT USE_ROCM is set
MIOpen not found. Compiling without MIOpen support
MKL_THREADING = OMP
Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_intel_thread - mkl_core]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_sequential - mkl_core]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_core - libiomp5md - pthread]
Library mkl_intel: not found
Checking for [mkl_intel_lp64 - mkl_core - pthread]
Library mkl_intel_lp64: not found
Checking for [mkl_intel - mkl_core - pthread]
Library mkl_intel: not found
Checking for [mkl - guide - pthread - m]
Library mkl: not found
MKL library not found
Checking for [Accelerate]
Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
Checking for [vecLib]
Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
Could not find OpenBLAS include. Turning OpenBLAS_FOUND off
Could not find OpenBLAS lib. Turning OpenBLAS_FOUND off
Checking for [openblas]
Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
Checking for [openblas - pthread - m]
Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
Checking for [libopenblas]
Library libopenblas: BLAS_libopenblas_LIBRARY-NOTFOUND
Checking for [goto2 - gfortran]
Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
Checking for [goto2 - gfortran - pthread]
Library goto2: BLAS_goto2_LIBRARY-NOTFOUND
Checking for [acml - gfortran]
Library acml: BLAS_acml_LIBRARY-NOTFOUND
Checking for [blis]
Library blis: BLAS_blis_LIBRARY-NOTFOUND
Could NOT find Atlas (missing: Atlas_CBLAS_INCLUDE_DIR Atlas_CLAPACK_INCLUDE_DIR Atlas_CBLAS_LIBRARY Atlas_BLAS_LIBRARY Atlas_LAPACK_LIBRARY)
Checking for [ptf77blas - atlas - gfortran]
Library ptf77blas: BLAS_ptf77blas_LIBRARY-NOTFOUND
Checking for [blas]
Library blas: BLAS_blas_LIBRARY-NOTFOUND
Cannot find a library with BLAS API. Not using BLAS.
MKLDNN_CPU_RUNTIME = OMP
CMake Warning (dev) at C:/Program Files/CMake/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (OpenMP_C)
does not match the name of the calling package (OpenMP). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/Modules/FindOpenMP.cmake:565 (find_package_handle_standard_args)
third_party/ideep/mkl-dnn/cmake/OpenMP.cmake:61 (find_package)
third_party/ideep/mkl-dnn/CMakeLists.txt:118 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at C:/Program Files/CMake/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (OpenMP_CXX)
does not match the name of the calling package (OpenMP). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/Modules/FindOpenMP.cmake:565 (find_package_handle_standard_args)
third_party/ideep/mkl-dnn/cmake/OpenMP.cmake:61 (find_package)
third_party/ideep/mkl-dnn/CMakeLists.txt:118 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
GPU support is disabled
Primitive cache is enabled
Found MKL-DNN: TRUE
Version: 7.0.3
Build type: Release
CXX_STANDARD: 14
Required features: cxx_variadic_templates
Not using libkineto in a non-CUDA build.
Could NOT find Backtrace (missing: Backtrace_LIBRARY Backtrace_INCLUDE_DIR)
don't use NUMA
Using ATen parallel backend: OMP
disabling CUDA because USE_CUDA is set false
AT_INSTALL_INCLUDE_DIR include/ATen/core
core header install: C:/Dev/pytorch-1.8/pytorch/build/aten/src/ATen/core/TensorBody.h
CMake Warning (dev) at C:/Program Files/CMake/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (OpenMP_C)
does not match the name of the calling package (OpenMP). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/Modules/FindOpenMP.cmake:565 (find_package_handle_standard_args)
caffe2/CMakeLists.txt:1020 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at C:/Program Files/CMake/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:273 (message):
The package name passed to `find_package_handle_standard_args` (OpenMP_CXX)
does not match the name of the calling package (OpenMP). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/Modules/FindOpenMP.cmake:565 (find_package_handle_standard_args)
caffe2/CMakeLists.txt:1020 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
pytorch is compiling with OpenMP.
OpenMP CXX_FLAGS: -openmp:experimental.
OpenMP libraries: .
Caffe2 is compiling with OpenMP.
OpenMP CXX_FLAGS: -openmp:experimental.
OpenMP libraries: .
CMake Warning at CMakeLists.txt:854 (message):
Generated cmake files are only fully tested if one builds with system glog,
gflags, and protobuf. Other settings may generate files that are not well
tested.
******** Summary ********
General:
CMake version : 3.18.4
CMake command : C:/Program Files/CMake/bin/cmake.exe
System : Windows
C++ compiler : C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29910/bin/Hostx64/x64/cl.exe
C++ compiler id : MSVC
C++ compiler version : 19.28.29913.0
CXX flags : /DWIN32 /D_WINDOWS /GR /EHsc /w /MP /bigobj -DUSE_PTHREADPOOL -openmp:experimental -DNDEBUG -DUSE_FBGEMM -DUSE_XNNPACK
Build type : Release
Compile definitions : WIN32_LEAN_AND_MEAN;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;_CRT_SECURE_NO_DEPRECATE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
CMAKE_PREFIX_PATH :
CMAKE_INSTALL_PREFIX : C:/Program Files (x86)/Torch
TORCH_VERSION : 1.8.0
CAFFE2_VERSION : 1.8.0
BUILD_CAFFE2 : OFF
BUILD_CAFFE2_OPS : OFF
BUILD_CAFFE2_MOBILE : OFF
BUILD_STATIC_RUNTIME_BENCHMARK: OFF
BUILD_TENSOREXPR_BENCHMARK: OFF
BUILD_BINARY : OFF
BUILD_CUSTOM_PROTOBUF : ON
Link local protobuf : ON
BUILD_DOCS : OFF
BUILD_PYTHON : OFF
BUILD_SHARED_LIBS : ON
CAFFE2_USE_MSVC_STATIC_RUNTIME : OFF
BUILD_TEST : OFF
BUILD_JNI : OFF
BUILD_MOBILE_AUTOGRAD : OFF
INTERN_BUILD_MOBILE :
USE_BLAS : 0
USE_LAPACK : 0
USE_ASAN : OFF
USE_CPP_CODE_COVERAGE : OFF
USE_CUDA : OFF
USE_ROCM : OFF
USE_EIGEN_FOR_BLAS : ON
USE_FBGEMM : ON
USE_FAKELOWP : OFF
USE_KINETO : OFF
USE_FFMPEG : OFF
USE_GFLAGS : OFF
USE_GLOG : OFF
USE_LEVELDB : OFF
USE_LITE_PROTO : OFF
USE_LMDB : OFF
USE_METAL : OFF
USE_PYTORCH_METAL : OFF
USE_FFTW : OFF
USE_MKL : OFF
USE_MKLDNN : ON
USE_NCCL : OFF
USE_NNPACK : OFF
USE_NUMPY : OFF
USE_OBSERVERS : ON
USE_OPENCL : OFF
USE_OPENCV : OFF
USE_OPENMP : ON
USE_TBB : OFF
USE_VULKAN : OFF
USE_PROF : OFF
USE_QNNPACK : OFF
USE_PYTORCH_QNNPACK : OFF
USE_REDIS : OFF
USE_ROCKSDB : OFF
USE_ZMQ : OFF
USE_DISTRIBUTED : OFF
USE_DEPLOY : OFF
Public Dependencies : Threads::Threads
Private Dependencies : pthreadpool;cpuinfo;XNNPACK;fbgemm;fp16;foxi_loader;fmt::fmt-header-only
Configuring done
Generating done
Thanks !!