found similar issue, TorchScript traced by pytorch 1.6 and C++ program link to libtorch 1.7.0, code run successfully on cpu, but segmentation error on cuda:
Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007f8c68de0e01 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) bt
#0 0x00007f8c68de0e01 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007f8c68cf9747 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007f8c68cf9b2e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007f8c68ec6442 in cuLaunchKernel () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007f8cddaa5b2e in torch::jit::tensorexpr::CudaCodeGen::CompileToNVRTC(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#5 0x00007f8cddaae94c in torch::jit::tensorexpr::CudaCodeGen::Initialize() () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#6 0x00007f8cddab36e8 in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#7 0x00007f8cdc7c8ef6 in torch::jit::tensorexpr::CreateCodeGen(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::tensorexpr::Stmt*, std::vector<torch::jit::tensorexpr::CodeGen::BufferArg, std::allocator<torch::jit::tensorexpr::CodeGen::BufferArg> > const&, c10::Device) () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#8 0x00007f8cdc84e4ba in torch::jit::tensorexpr::TensorExprKernel::compile() () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#9 0x00007f8cdc84ea22 in torch::jit::tensorexpr::TensorExprKernel::TensorExprKernel(std::shared_ptr<torch::jit::Graph> const&) () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#10 0x00007f8cdc6c8cfd in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#11 0x00007f8cdc71145e in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#12 0x00007f8cdc710f63 in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#13 0x00007f8cdc713512 in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#14 0x00007f8cdc71385f in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#15 0x00007f8cdc713a1f in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#16 0x00007f8cdc710eb5 in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#17 0x00007f8cdc713512 in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#18 0x00007f8cdc709e68 in torch::jit::Code::Code(std::shared_ptr<torch::jit::Graph> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long) ()
from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#19 0x00007f8cdc72967c in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#20 0x00007f8cdc728706 in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#21 0x00007f8cdc6f9fb5 in ?? () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#22 0x00007f8cdc416b7a in torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#23 0x00007f8cdc4260e5 in torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
#24 0x00007f8cda1803e0 in torch::jit::Module::forward(std::vector<c10::IValue, std::allocator<c10::IValue> >) () from /home/gemfield/pydeepvac.cpython-36m-x86_64-linux-gnu.so
When c++ program link to pytorch 1.6, both cpu and gpu are running successfully.