“Segmentation fault (core dumped)” is all I have about the issue. Since the sys admin is very disagreeable, I have to figure out what the problem is myself. But I really don’t know what the cause of the crash could be. This thread talked about this issue, but I couldn’t find a solution therein that works for me. I’m at my wits end so I ask here for help. Following is the configuration of the machine:
Remote Linux with core version 5.8.0. I am not a super user.
Python 3.8.6
CUDA Version: 11.1
GPU is RTX 3090 with driver version 455.23.05
CPU: Intel Core i9-10900K
PyTorch version: 1.8.0+cu111 (I found it at /usr/local/lib/python3.8/dist-packages/torch/version.py)
System imposed RAM quota: 4GB
System imposed number of threads: 512198
System imposed RLIMIT_NPROC value: 300
If you need other information related to this error, please let me know. Thank you for helping me troubleshoot this problem.
@ptrblck
I installed gdb 11.1 on the remote Linux node. If I run gdb commands below
gdb python3
r -c "import torch"
bt
I got the following output:
Starting program: /usr/bin/python3 -c “import torch”
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[Detaching after fork from child process 2451545]
[New Thread 0x7fff09280640 (LWP 2451546)]
[New Thread 0x7fff08a7f640 (LWP 2451547)]
[New Thread 0x7fff0027e640 (LWP 2451548)]
[New Thread 0x7ffeffa7d640 (LWP 2451549)]
Thread 3 “python3” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff08a7f640 (LWP 2451547)]
0x0000000000000000 in ?? ()
(gdb) bt #0 0x0000000000000000 in ?? () #1 0x00007fff09e23760 in blas_memory_alloc ()
from /usr/local/lib/python3.8/dist-packages/numpy/core/…/…/numpy.libs/libopenblasp-r0-09e95953.3.13.so #2 0x00007fff09e23f24 in blas_thread_server ()
from /usr/local/lib/python3.8/dist-packages/numpy/core/…/…/numpy.libs/libopenblasp-r0-09e95953.3.13.so #3 0x00007ffff7f90590 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007ffff7d0f223 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)
Can we find out any clue of the segfault? Please let me know what I need to do next to probe the problem. Thanks.
Based on the backtrace it seems that numpy’s libopenblas creates the seg fault. Did you install numpy with the PyTorch wheels? If not, install it or update to the latest PyTorch release, as recently we’ve found this issue, which might be related.
numpy is installed in the system folder by the system admin, so I’m not sure if it is installed together with PyTorch. But I can retrieve the version installed as follows:
Name: numpy
Version: 1.19.5
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email:
License: BSD
Location: /usr/local/lib/python3.8/dist-packages
Requires:
Required-by: opencv-python, imageio, torchvision, torch, tensorflow, tensorboard, opt-einsum, Keras-Preprocessing, h5py
By “install it” do you mean install numpy? If so, I’ll try to install numpy in my user space. Would the latest numpy that pip will install do? How do I make sure that it is the numpy installed in my user space that is used when running import PyTorch (i.e., will the numpy installed in the system be masked)? Thank you.
I installed numpy in my user space. The version is 1.21.3. Now when I run import torch, there are some new error messages:
import torch
OpenBLAS blas_thread_init: pthread_create failed for thread 10 of 20: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 300 current, 300 max
Segmentation fault (core dumped)
Is this because the RLIMIT_NPROC number is too small? I think 300 concurrent processes sounds large enough. If this is the cause of segfault, how large this number should be? Thank you.
PS, according to the error message, OpenBLAS failed at creating new threads. The system imposed number of threads is 512198, I think that is also large enough. In addition, why does OpenBLAS have to create 20 threads (is OpenBLAS trying to build a thread pool)? Is it possible to create only 9 so that the failure would not occur (if it does not affect the performance)?
I’m so glad to let you know that the problem is solved by adding the following lines before import torch:
import os
os.environ['OPENBLAS_NUM_THREADS'] = '5'
Maybe different machine needs different number. For the numpy version 1.21.3 just installed, 5 is the max I can reach. I would get error “ImportError: numpy.core.multiarray failed to import” if the number is set to 6. For numpy version 1.19.5 originally installed system-wide, the max is only 2. Above 2, the import statement just crashes with segfault. So installing the latest numpy is not needed but can improve the performance because max allowed thread# is increased. Anyway, it’s solved! Thank you so much for your help, ptrblck.
Good to hear you were able to come up with a workaround! In case you are seeing the same seg fault in openblas by purely importing numpy, it might be a good idea to create a GitHub issue in the numpy repository so that the devs are aware of it.