Hello!
I’m having a segmentation fault during the training of a model using PyTorch. The model is a fully connected NN.
My setup is:
The basic CUDA checks work correctly. In WSL, PyTorch sees the GPU and simple CUDA tensor operations run without problems:
-
torch.cuda.is_available() == True
-
torch.cuda.get_device_name(0) returns NVIDIA GeForce RTX 5070 Ti
-
simple matrix multiplications on GPU work
However, when I run my actual training script, it starts correctly, trains for a while, and then after some epochs it crashes with:
Segmentation fault (core dumped)
The model is a standard PyTorch model built mostly from:
So at this point, I suspect one of these:
-
a PyTorch/CUDA bug on WSL2 with RTX 5070 Ti
-
some operation in backward or optimizer.step() triggering a native crash
Try to grab a stacktrace by running your script ingdb and post it here.
This is what I get typing bt and then thread apply all bt
(gdb) thread apply all bt
Thread 82 (Thread 0x7ffbf980b640 (LWP 700673) "pt_autograd_0"):
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555568d82bb8) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x555568d82bb8) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x555568d82bb8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007ffff7ceba41 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555568d82bc0, cond=0x555568d82b90) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x555568d82b90, mutex=0x555568d82bc0) at ./nptl/pthread_cond_wait.c:627
#5 0x00007fffbcc80747 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007fff231b4abb in torch::autograd::ReadyQueue::pop() () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#7 0x00007fff231b910f in torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#8 0x00007fff231ae977 in torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/li
--Type <RET> for more, q to quit, c to continue without paging--c
b/libtorch_cpu.so
#9 0x00007fff33211fa2 in torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#10 0x00007fffbccb0253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#11 0x00007ffff7cecac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#12 0x00007ffff7d7e8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 81 (Thread 0x7ffc1e83c640 (LWP 700645) "python"):
#0 0x00007ffff6c1de8e in gomp_barrier_wait_end () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libgomp.so.1
#1 0x00007ffff6c1b498 in gomp_thread_start () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libgomp.so.1
#2 0x00007ffff7cecac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#3 0x00007ffff7d7e8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 80 (Thread 0x7ffc1f03d640 (LWP 700644) "python"):
#0 0x00007ffff6c1de8e in gomp_barrier_wait_end () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libgomp.so.1
#1 0x00007ffff6c1b498 in gomp_thread_start () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libgomp.so.1
#2 0x00007ffff7cecac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#3 0x00007ffff7d7e8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 79 (Thread 0x7ffc1f83e640 (LWP 700643) "python"):
#0 0x00007ffff6c1de8e in gomp_barrier_wait_end () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libgomp.so.1
#1 0x00007ffff6c1b498 in gomp_thread_start () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libgomp.so.1
#2 0x00007ffff7cecac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#3 0x00007ffff7d7e8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 78 (Thread 0x7ffc2003f640 (LWP 700642) "python"):
#0 0x00007ffff6c1de8e in gomp_barrier_wait_end () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libgomp.so.1
#1 0x00007ffff6c1b498 in gomp_thread_start () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libgomp.so.1
#2 0x00007ffff7cecac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#3 0x00007ffff7d7e8d0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
...
Thread 1 (Thread 0x7ffff7c57000 (LWP 700532) "python"):
#0 0x00007fff32ecf974 in pybind11::gil_scoped_release::~gil_scoped_release() () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#1 0x00007fff32e895b2 in torch::autograd::THPVariable_reshape(_object*, _object*, _object*) () from /home/marcoledda2/venvs/torch/lib/python3.10/site-packages/torch/lib/libtorch_python.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Are you using any custom C++ extensions? If so, could you disable them for now?
Also, does torch.randn(4, 4).reshape(2, 8) fail when executed standalone?
No, I am not using any custom C++/CUDA extensions. I only have the standard PyTorch installation in a Python virtual environment. The Microsoft Visual C++ Redistributables are installed on Windows, but I am not using any custom PyTorch C++ extension or compiled module in this project.
I tested the standalone reshape example:
python -c “import torch; print(torch.version); print(torch.randn(4, 4).reshape(2, 8))”
and it works correctly.
I also tested the CUDA version:
python -c “import torch; print(torch.version); x=torch.randn(4,4,device=‘cuda’); print(x.reshape(2,8)); torch.cuda.synchronize(); print(‘ok’)”
and it also works correctly.
I found another similar thread here in the Forum and it might be caused by BIOS settings about CPU–GPU interaction and power states. I will update if I get the solution.