Torch::load gives segmentation Segmentation fault

baris · February 18, 2021, 9:16pm

std::stringstream stream;

auto tensor = torch::ones({3, 4});
torch::save(tensor, stream);

torch::Tensor tensor2;
torch::load(tensor2, stream);

The very last line gives a Segmentation fault (core dumped).

Similar code was working fine on another machine, but in my current computer it doesn’t. What can be the problem?

ptrblck · February 19, 2021, 7:52am

If the same code is running fine on another machine, I would guess that you are using different PyTorch versions?
If so, which one is working and which one is failing?
Could you also post the batcktrace of gdb via:

gdb --args ./my_app
...
run
...
bt

baris · February 19, 2021, 8:16am

#include <torch/torch.h>
#include <iostream>

int main(){
	std::stringstream stream;

	std::cout << "tensor1:" << std::endl;
	torch::Tensor tensor1 = torch::eye(3);
	torch::save(tensor1, stream);
	std::cout << tensor1 << std::endl;

	std::cout << "tensor2:" << std::endl;
	torch::Tensor tensor2;
	torch::load(tensor2, stream);
	std::cout << tensor2 << std::endl;
}

this is the code that i run. btw it works fine using this CMakeList:

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(dcgan)

find_package(Torch REQUIRED)

add_executable(dcgan dcgan.cpp)
target_link_libraries(dcgan "${TORCH_LIBRARIES}")
set_property(TARGET dcgan PROPERTY CXX_STANDARD 14)

and gives this output:

tensor1:
 1  0  0
 0  1  0
 0  0  1
[ CPUFloatType{3,3} ]
tensor2:
 1  0  0
 0  1  0
 0  0  1
[ CPUFloatType{3,3} ]

However within the project that I’m working, which has multiple CMakeLists in a way that’s too advanced for me, it gives me this:

tensor1:
 1  0  0
 0  1  0
 0  0  1
[ CPUFloatType{3,3} ]
tensor2:
Segmentation fault (core dumped)

and here is the backtrace you wanted:

GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./src/algos/simple...done.
(gdb) run
Starting program: /home/baris/sdms/build/src/algos/simple 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
tensor1:
 1  0  0
 0  1  0
 0  0  1
[ CPUFloatType{3,3} ]
tensor2:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6a1b2cf in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type) ()
   from /opt/libtorch/lib/libc10.so
(gdb) bt
#0  0x00007ffff6a1b2cf in std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char const*, char const*, std::locale const&, std::regex_constants::syntax_option_type)
    () from /opt/libtorch/lib/libc10.so
#1  0x00007ffff6a09fa6 in c10::Device::Device(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /opt/libtorch/lib/libc10.so
#2  0x00007fffe916493d in torch::jit::Unpickler::readInstruction() () from /opt/libtorch/lib/libtorch_cpu.so
#3  0x00007fffe9166840 in torch::jit::Unpickler::run() () from /opt/libtorch/lib/libtorch_cpu.so
#4  0x00007fffe9166df1 in torch::jit::Unpickler::parse_ivalue() () from /opt/libtorch/lib/libtorch_cpu.so
#5  0x00007fffe910acc2 in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&) () from /opt/libtorch/lib/libtorch_cpu.so
#6  0x00007fffe910afdd in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [clone .constprop.789] () from /opt/libtorch/lib/libtorch_cpu.so
#7  0x00007fffe910d905 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize(c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) [clone .constprop.788] ()
   from /opt/libtorch/lib/libtorch_cpu.so
#8  0x00007fffe910ded9 in torch::jit::load(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_delete<caffe2::serialize::ReadAdapterInterface> >, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::all---Type <return> to continue, or q <return> to quit---q

ptrblck · February 19, 2021, 8:20am

Thanks for the update. Assuming you are using the latest stable release (1.7.1), could you update to the nightly release and rerun it?
If you are still seeing the issue, please create an issue on GitHub with a minimal executable code snippet.

baris · February 19, 2021, 8:40am

Unfortunately with nightly release still it still crushes. Is my previous comment enough as minimal executable code, knowing that it doesn’t crush with the given CMakeList, but does crush with the setup that I can’t share?

ptrblck · February 19, 2021, 8:43am

I don’t think so, as we would need to reproduce it.
Since the setup using the posted CMakeLists.txt with the code snippet works fine, I guess the issue is created by your original setup.
If you cannot narrow it down, feel free to post the issue nevertheless, as someone might have already seen a similar issue and might be able to help.

baris · February 19, 2021, 10:49am

We solved the issue.

ptrblck · February 19, 2021, 10:51am

That’s good to hear! What was the issue?

baris · February 22, 2021, 5:38pm

The order of inclusion of libraries was PROJECT, Boost, Torch. When we changed it to Torch, Boost, PROJECT the issue went away.