Segmentation Fault when importing PyTorch

When I tried to import PyTorch in python, it crashed with a segfault error:

“Segmentation fault (core dumped)” is all I have about the issue. Since the sys admin is very disagreeable, I have to figure out what the problem is myself. But I really don’t know what the cause of the crash could be. This thread talked about this issue, but I couldn’t find a solution therein that works for me. I’m at my wits end so I ask here for help. Following is the configuration of the machine:

  • Remote Linux with core version 5.8.0. I am not a super user.
  • Python 3.8.6
  • CUDA Version: 11.1
  • GPU is RTX 3090 with driver version 455.23.05
  • CPU: Intel Core i9-10900K
  • PyTorch version: 1.8.0+cu111 (I found it at /usr/local/lib/python3.8/dist-packages/torch/version.py)
  • System imposed RAM quota: 4GB
  • System imposed number of threads: 512198
  • System imposed RLIMIT_NPROC value: 300

If you need other information related to this error, please let me know. Thank you for helping me troubleshoot this problem.

Could you update to the latest PyTorch release (1.9.1) or the nightly and check, if this would solve the issue?

It’s a remote Linux. I am not the admin, so I don’t have privilege to upgrade PyTorch.

You could try to grab the backtrace via:

gdb --args python -c "import torch"
...
run
...
bt

but given that you are not able to update PyTorch I don’t know which access rights you have on this node to debug the issue.

@ptrblck
I installed gdb 11.1 on the remote Linux node. If I run gdb commands below

gdb python3
r -c "import torch"
bt

I got the following output:

Starting program: /usr/bin/python3 -c “import torch”
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[Detaching after fork from child process 2451545]
[New Thread 0x7fff09280640 (LWP 2451546)]
[New Thread 0x7fff08a7f640 (LWP 2451547)]
[New Thread 0x7fff0027e640 (LWP 2451548)]
[New Thread 0x7ffeffa7d640 (LWP 2451549)]

Thread 3 “python3” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff08a7f640 (LWP 2451547)]
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007fff09e23760 in blas_memory_alloc ()
from /usr/local/lib/python3.8/dist-packages/numpy/core/…/…/numpy.libs/libopenblasp-r0-09e95953.3.13.so
#2 0x00007fff09e23f24 in blas_thread_server ()
from /usr/local/lib/python3.8/dist-packages/numpy/core/…/…/numpy.libs/libopenblasp-r0-09e95953.3.13.so
#3 0x00007ffff7f90590 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007ffff7d0f223 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)

Can we find out any clue of the segfault? Please let me know what I need to do next to probe the problem. Thanks.

Based on the backtrace it seems that numpy’s libopenblas creates the seg fault. Did you install numpy with the PyTorch wheels? If not, install it or update to the latest PyTorch release, as recently we’ve found this issue, which might be related.

numpy is installed in the system folder by the system admin, so I’m not sure if it is installed together with PyTorch. But I can retrieve the version installed as follows:

Name: numpy
Version: 1.19.5
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email:
License: BSD
Location: /usr/local/lib/python3.8/dist-packages
Requires:
Required-by: opencv-python, imageio, torchvision, torch, tensorflow, tensorboard, opt-einsum, Keras-Preprocessing, h5py

By “install it” do you mean install numpy? If so, I’ll try to install numpy in my user space. Would the latest numpy that pip will install do? How do I make sure that it is the numpy installed in my user space that is used when running import PyTorch (i.e., will the numpy installed in the system be masked)? Thank you.

I installed numpy in my user space. The version is 1.21.3. Now when I run import torch, there are some new error messages:

import torch
OpenBLAS blas_thread_init: pthread_create failed for thread 10 of 20: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 300 current, 300 max
Segmentation fault (core dumped)

Is this because the RLIMIT_NPROC number is too small? I think 300 concurrent processes sounds large enough. If this is the cause of segfault, how large this number should be? Thank you.

PS, according to the error message, OpenBLAS failed at creating new threads. The system imposed number of threads is 512198, I think that is also large enough. In addition, why does OpenBLAS have to create 20 threads (is OpenBLAS trying to build a thread pool)? Is it possible to create only 9 so that the failure would not occur (if it does not affect the performance)?

@ptrblck

I’m so glad to let you know that the problem is solved by adding the following lines before import torch:

import os
os.environ['OPENBLAS_NUM_THREADS'] = '5'

Maybe different machine needs different number. For the numpy version 1.21.3 just installed, 5 is the max I can reach. I would get error “ImportError: numpy.core.multiarray failed to import” if the number is set to 6. For numpy version 1.19.5 originally installed system-wide, the max is only 2. Above 2, the import statement just crashes with segfault. So installing the latest numpy is not needed but can improve the performance because max allowed thread# is increased. Anyway, it’s solved! Thank you so much for your help, ptrblck.

:slight_smile:

Good to hear you were able to come up with a workaround! In case you are seeing the same seg fault in openblas by purely importing numpy, it might be a good idea to create a GitHub issue in the numpy repository so that the devs are aware of it.