How to run the model on gpu in C++

I have scripted my module in python on gpu, but when I want to run my model on gpu in C++, it comes an error:

terminate called after throwing an instance of 'c10::Error'
  what():  Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason.  The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols.  You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library. (initCUDA at /pytorch/aten/src/ATen/detail/CUDAHooksInterface.h:58)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fc99a5be441 in /home/kevinhaoliu/libtorch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fc99a5bdd7a in /home/kevinhaoliu/libtorch/lib/libc10.so)
frame #2: <unknown function> + 0x8292ee (0x7fc9909c92ee in /home/kevinhaoliu/libtorch/lib/libcaffe2.so)
frame #3: void std::__once_call_impl<std::_Bind_simple<at::Context::lazyInitCUDA()::{lambda()#1} ()> >() + 0x2f (0x7fc99b1273df in /home/kevinhaoliu/libtorch/lib/libtorch.so.1)
frame #4: pthread_once + 0x50 (0x7fc98fd6cbb0 in /lib64/libpthread.so.0)
frame #5: void std::call_once<at::Context::lazyInitCUDA()::{lambda()#1}>(std::once_flag&, at::Context::lazyInitCUDA()::{lambda()#1}&&) + 0x55 (0x7fc99b31cc55 in /home/kevinhaoliu/libtorch/lib/libtorch.so.1)
frame #6: void std::__once_call_impl<std::_Bind_simple<at::LegacyTypeDispatch::initForDeviceType(c10::DeviceType)::{lambda()#2} ()> >() + 0x31 (0x7fc9909a6661 in /home/kevinhaoliu/libtorch/lib/libcaffe2.so)
frame #7: pthread_once + 0x50 (0x7fc98fd6cbb0 in /lib64/libpthread.so.0)
frame #8: at::getType(c10::TensorOptions) + 0x268 (0x7fc9909a5178 in /home/kevinhaoliu/libtorch/lib/libcaffe2.so)
frame #9: at::native::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool) + 0x663 (0x7fc990baa033 in /home/kevinhaoliu/libtorch/lib/libcaffe2.so)
frame #10: at::TypeDefault::to(at::Tensor const&, c10::Device, c10::ScalarType, bool, bool) const + 0x1b (0x7fc990e2a3eb in /home/kevinhaoliu/libtorch/lib/libcaffe2.so)
frame #11: <unknown function> + 0xa7f133 (0x7fc99b256133 in /home/kevinhaoliu/libtorch/lib/libtorch.so.1)
frame #12: torch::jit::load(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_delete<caffe2::serialize::ReadAdapterInterface> >, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0x10d (0x7fc99b2570cd in /home/kevinhaoliu/libtorch/lib/libtorch.so.1)
frame #13: torch::jit::load(std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0x68 (0x7fc99b2571f8 in /home/kevinhaoliu/libtorch/lib/libtorch.so.1)
frame #14: main + 0x92 (0x43f4fb in ./example-app)
frame #15: __libc_start_main + 0xf5 (0x7fc98f144b35 in /lib64/libc.so.6)
frame #16: ./example-app() [0x43e8b9]
Aborted (core dumped)

And I have added the libc10_cuda.so to the link. I have also done this:module->to(torch::Device(torch::kCUDA, 0));But it still doesn’t work.

My CUDA version is 9.0 and CUDNN version is 7.

Did you follow the advice regarding linker options?
On linux you can use objdump -x | grep NEEDED to see what libraries have been linked.

Best regards

Thomas

Thank you for your reply, I have solved this problem with your help! I have another problem, it is when I try to run the model in C++ it comes up:
Warning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (_cudnn_impl at /pytorch/aten/src/ATen/native/cudnn/RNN.cpp:1266)

I find that in my python code it has the same warning information like this:RuntimeWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters()., but if I don’t use scripting method and just run the model, the warning information does not appear.

I tried to add self.rnn.flatten_parameters() in my python code while scripting my model, but it will come up another error. Do you have any idea about this?

Best regards

Kevin