I am trying to train on linux (python) and do inference on windows with c++ static lib application.
When calling torch::jit::script::Module::Forward(), following error occurs.
The application with dll does not fail.
The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/Model.py", line 37, in forward
_19 = (_6).forward((_7).forward((_8).forward(_18, ), ), )
input0 = torch.cat([(_5).forward(_19, ), _15], 1)
_20 = (_3).forward((_4).forward(input0, ), )
~~~~~~~~~~~ <--- HERE
_21 = (_2).forward((_14).forward2(_20, ), )
return (_0).forward((_1).forward(_21, ), )
File "code/__torch__/CompactModel.py", line 36, in forward
_18 = (_9).forward((_10).forward(_17, ), )
_19 = (_6).forward((_7).forward((_8).forward(_18, ), ), )
input0 = torch.cat([(_5).forward(_19, ), _15], 1)
~~~~~~~~~ <--- HERE
_20 = (_3).forward((_4).forward(input0, ), )
_21 = (_2).forward((_14).forward2(_20, ), )
Traceback of TorchScript, original code (most recent call last):
/docker_share/source/Model.py(193): forward
/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py(534): _slow_forward
/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py(548): __call__
/usr/local/lib/python3.8/site-packages/torch/jit/__init__.py(1027): trace_module
/usr/local/lib/python3.8/site-packages/torch/jit/__init__.py(873): trace
./source/jitrace.py(700): <module>
RuntimeError: error in LoadLibraryA
The error message changed from “LoadLibraryA” to “GetProcAddress” when I placed “caffe2_nvrtc.dll” next to the application. caffe2_nvrtc.dll is created under build/bin and caffe2_nvrtc.lib is not created.
Is caffe2_nvrtc.dll related this problem ?
Unlike the relations between .so and .a on Linux, .lib files don’t necessarily refer to the file name of a static lib. It can also be an import library for the DLL.
Hey, we don’t provide static libs currently. You’ll have to build that from source.
Let me rephrase what I wanted to say…
What I want to do is to create successful static version of libtorch.
I have built the static libtorch from source with “set BUILD_SHARED_LIBS=OFF” BUT
cuffe2_nvrtc.lib is NOT created.
caffe2_nvrtc.dll is created under torch/bin.
My assumption is below.
“SHARED” is set in add_library and because of that, DLL is created. At the same time,
BUILD_SHARED_LIBS=OFF and dllexport is not defined and lib is not created. Files under lib are linked to the application and error occurs.
I now know by using official caffe2_nvrtc.lib and caffe2_nvrtc.dll, the application succeeds.
I’d like to know how to create static libtorch library.
I’ve done rewriting CMakeList.txt and built static version of libtorch and
caffe2_nvrtc.lib is created (and this should be static lib, right?)
caffe2_nvrtc.dll is NOT created.
By linking above to the app, it will end up with the error I wrote in the beginning.
By placing official caffe2_nvrtc.dll next to the app, it works. (meaning static lib is not created correctly ?)
Maybe some of the libs are optimized away. You could try passing /WHOLEARCHIVE:caffe2_nvrtc.lib in your project to force the linker to stop doing that.
I’ve tried few other things and below change on caffe2/CMakeList.txt worked!
diff --git a/caffe2/CMakeLists.txt b/caffe2/CMakeLists.txt
index 8025a7de3c..8e94978e72 100644
--- a/caffe2/CMakeLists.txt
+++ b/caffe2/CMakeLists.txt
@@ -561,7 +561,7 @@ if (NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE)
${TORCH_SRC_DIR}/csrc/cuda/comm.cpp
${TORCH_SRC_DIR}/csrc/jit/tensorexpr/cuda_codegen.cpp
)
- add_library(caffe2_nvrtc SHARED ${ATen_NVRTC_STUB_SRCS})
+ add_library(caffe2_nvrtc ${ATen_NVRTC_STUB_SRCS})
target_link_libraries(caffe2_nvrtc ${CUDA_NVRTC} ${CUDA_CUDA_LIB} ${CUDA_NVRTC_LIB})
target_include_directories(caffe2_nvrtc PRIVATE ${CUDA_INCLUDE_DIRS})
install(TARGETS caffe2_nvrtc DESTINATION "${TORCH_INSTALL_LIB_DIR}")
@@ -703,6 +703,9 @@ ELSEIF(USE_CUDA)
cuda_add_library(torch_cuda ${Caffe2_GPU_SRCS})
set(CUDA_LINK_LIBRARIES_KEYWORD)
torch_compile_options(torch_cuda) # see cmake/public/utils.cmake
+ if (NOT BUILD_SHARED_LIBS)
+ target_compile_definitions(torch_cuda PRIVATE USE_DIRECT_NVRTC)
+ endif()
if (USE_NCCL)
target_link_libraries(torch_cuda PRIVATE __caffe2_nccl)
In aten/src/ATen/cuda/detail/CUDAHooks.cpp, “#ifdef USE_DIRECT_NVRTC” directive is used.
But “USE_DIRECT_NVRTC” was not defined in any CMakeList.txt and because of that,
application linked with satic libtorch tries to load “caffe2_nvrtc.dll”.