Hello everyone. I hope you having a good Day.
I don’t know why at some random point in training a model i receive this message:
“Thread 43 “train” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff1bfff700 (LWP 24128)]
0x00007fffe0077d00 in void c10::function_ref<void (char**, long const*, long, long)>::callback_fn<at::TensorIteratorBase::loop_2d_from_1d<at::native::AVX2::copy_kernel(at::TensorIterator&, bool)::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda(char**, long const*, long)#1}>(at::native::AVX2::copy_kernel(at::TensorIterator&, bool)::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda(char**, long const*, long)#1} const&)::{lambda(char**, long const*, long, long)#1}>(long, char**, long const*, long, long) ()
from /home/libtorch-cxx11-abi-shared-with-deps-2.0.1+cpu/libtorch/lib/libtorch_cpu.so”
when i use bt to trace with the debugger gdb , this is what i get "
#0 0x00007fffe0077d00 in void c10::function_ref<void (char**, long const*, long, long)>::callback_fn<at::TensorIteratorBase::loop_2d_from_1d<at::native::AVX2::copy_kernel(at::TensorIterator&, bool)::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda(char**, long const*, long)#1}>(at::native::AVX2::copy_kernel(at::TensorIterator&, bool)::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda(char**, long const*, long)#1} const&)::{lambda(char**, long const*, long, long)#1}>(long, char**, long const*, long, long) ()
at /home/modjo/mike/dependences/libtorch-cxx11-abi-shared-with-deps-2.0.1+cpu/libtorch/lib/libtorch_cpu.so
#1 0x00007fffdbc92d1f in at::TensorIteratorBase::serial_for_each(c10::function_ref<void (char**, long const*, long, long)>, at::Range) const ()
at /home/modjo/mike/dependences/libtorch-cxx11-abi-shared-with-deps-2.0.1+cpu/libtorch/lib/libtorch_cpu.so
#2 0x00007fffdbc92ede in void at::internal::invoke_parallel<at::TensorIteratorBase::for_each(c10::function_ref<void (char**, long const*, long, long)>, long)::{lambda(long, long)#1}>(long, long, long, at::TensorIteratorBase::for_each(c10::function_ref<void (char**, long const*, long, long)>, long)::{lambda(long, long)#1} const&) [clone ._omp_fn.0] ()
at /home/modjo/mike/dependences/libtorch-cxx11-abi-shared-with-deps-2.0.1+cpu/libtorch/lib/libtorch_cpu.so
#3 0x00007fffd766696e in () at /home/modjo/mike/dependences/libtorch-cxx11-abi-shared-with-deps-2.0.1+cpu/libtorch/lib/libgomp-52f2fd74.so.1
#4 0x00007fffda12017a in start_thread () at /usr/lib64/libpthread.so.0
#5 0x00007fffd9e4fdc3 in clone () at /usr/lib64/libc.so.6"
i am using libtorch-cxx11-abi-shared-with-deps-2.0.1+cpu