PyTorch CPU hangs on nn.Linear

Hello,

When trying to execute the example here PyTorch hangs indefinitely at the line

 x = F.relu(self.fc1(x))

This runs in virtual environment created by conda (using Python 3.6.5) and installed with

conda install pytorch-cpu torchvision-cpu -c pytorch

The virtual environment exports GCC 4.9.0 and glibc-2.14.1.

Here’s the output of the conda list command

# Name                    Version                   Build  Channel
ca-certificates           2018.03.07                    0
certifi                   2018.4.16                py36_0
cffi                      1.11.5           py36h9745a5d_0
freetype                  2.8                  hab7d2ae_1
intel-openmp              2018.0.0                      8
jpeg                      9b                   h024ee3a_2
libedit                   3.1                  heed3624_0
libffi                    3.2.1                hd88cf55_4
libgcc-ng                 7.2.0                hdf63c60_3
libgfortran-ng            7.2.0                hdf63c60_3
libpng                    1.6.34               hb9fc6fc_0
libstdcxx-ng              7.2.0                hdf63c60_3
libtiff                   4.0.9                h28f6b97_0
mkl                       2018.0.2                      1
mkl_fft                   1.0.1            py36h3010b51_0
mkl_random                1.0.1            py36h629b387_0
ncurses                   6.0                  h9df7e31_2
ninja                     1.8.2                h6bb024c_1
numpy                     1.14.2           py36hdbf6ddf_1
olefile                   0.45.1                   py36_0
openssl                   1.0.2o               h20670df_0
pillow                    5.1.0            py36h3deb7b8_0
pip                       10.0.1                   py36_0
pycparser                 2.18             py36hf9f622e_1
python                    3.6.5                hc3d631a_2
pytorch-cpu               0.4.0                py36_cpu_1    pytorch
readline                  7.0                  ha6073c6_4
setuptools                39.1.0                   py36_0
six                       1.11.0           py36h372c433_1
sqlite                    3.23.1               he433501_0
tk                        8.6.7                hc745277_3
torchvision-cpu           0.2.1                    py36_1    pytorch
wheel                     0.31.0                   py36_0
xz                        5.2.3                h5e939de_4
zlib                      1.2.11               ha838bed_2

Searching online suggests this hangs to be due to CUDA incompatibility issues, but I have no CUDA installed and am using the torch-cpu install. I also tried exporting

NO_CUDA=1 
CUDA_VISIBLE_DEVICES=

with no luck.

OS is CentOS 6.8 (no sudo access unfortunately)

Any ideas on what could be causing this?

Thanks!

S

Can you try running your script under gdb and reporting the backtrace? For example:

$ gdb --args python my_script.py
...
Reading symbols from python...done.
(gdb) run
...
<ctrl-c>
(gdb) backtrace
...

Here you go. Thank you!

(fairseq) bash-4.1$ gdb --args python
build/                    fairseq.egg-info/         .interactive.py.swp       README.md                 tests/
CONTRIBUTING.md           fairseq.gif               LICENSE                   requirements.txt          train.py
data/                     generate.py               multiprocessing_train.py  score.py                  wmt14.en-de.fconv-py/
distributed_train.py      .git/                     PATENTS                   scripts/                  wmt14.en-fr.fconv-py/
example.py                .gitignore                preprocess.py             setup.py
fairseq/                  interactive.py            __pycache__/              singleprocess_train.py
(fairseq) bash-4.1$ gdb --args python example.py
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-90.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/bin/python...done.
(gdb) run
Starting program: /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/bin/python example.py
[Thread debugging using libthread_db enabled]
Missing separate debuginfo for /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/numpy/../../../libiomp5.so
Detaching after fork from child process 24591.
[New Thread 0x7fffe535c780 (LWP 29388)]
[New Thread 0x7fffe4f5b800 (LWP 29396)]
[New Thread 0x7fffe4b5a880 (LWP 29400)]
[New Thread 0x7fffe4759900 (LWP 29405)]
[New Thread 0x7fffe4358980 (LWP 29409)]
[New Thread 0x7fffe3f57a00 (LWP 29414)]
[New Thread 0x7fffe3b56a80 (LWP 29418)]
[New Thread 0x7fffe3755b00 (LWP 29423)]
[New Thread 0x7fffe3354b80 (LWP 29427)]
[New Thread 0x7fffe2f53c00 (LWP 29432)]
[New Thread 0x7fffe2b52c80 (LWP 29436)]
Before
^C
Program received signal SIGINT, Interrupt.
pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162     62:     movl    (%rsp), %edi
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.192.el6.x86_64
(gdb) backtrace
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007ffff2083ce9 in __kmp_suspend_64 ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#2  0x00007ffff201ffc2 in _INTERNAL_25_______src_kmp_barrier_cpp_34128d84::__kmp_hyper_barrier_gather(barrier_type, kmp_info*, int, int, void (*)(void*, void*), void*) ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#3  0x00007ffff2023647 in __kmp_join_barrier(int) ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#4  0x00007ffff2052e72 in __kmp_internal_join ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#5  0x00007ffff2052666 in __kmp_join_call ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#6  0x00007ffff20267f7 in __kmpc_fork_call ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/numpy/../../../libiomp5.so
#7  0x00007fffeeeeb157 in mkl_blas_sgemv_omp ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/mkl_fft/../../../libmkl_intel_thread.so
#8  0x00007fffeed67208 in mkl_blas_sgemv ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/mkl_fft/../../../libmkl_intel_thread.so
#9  0x00007fffeeea1d68 in mkl_blas_sgemm ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/mkl_fft/../../../libmkl_intel_thread.so
#10 0x00007ffff0fc32f1 in sgemm_ ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/mkl_fft/../../../libmkl_intel_lp64.so
#11 0x00007ffff5a06890 in sgemm_ ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/numpy/core/../../../../libmkl_rt.so
#12 0x00007fffe90acd11 in THFloatBlas_gemm ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/lib/libATen.so
#13 0x00007fffe8d5eaa1 in THFloatTensor_addmm ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/lib/libATen.so
#14 0x00007fffe8b5dc29 in at::CPUFloatType::s_addmm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Scalar, at::Scalar) const ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/lib/libATen.so
#15 0x00007fffe9e034e2 in torch::autograd::VariableType::s_addmm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Scalar, at::Scalar) const () at torch/csrc/autograd/generated/VariableType.cpp:7500
#16 0x00007fffe8c2f2f8 in at::Type::addmm(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Scalar, at::Scalar) const ()
   from /gpfs/nlu/data/filesets/projects/here-work/sergey_mkrtchyan/anaconda3/envs/fairseq/lib/python3.6/site-packages/torch/lib/libATen.so
#17 0x00007fffe9ebc076 in torch::autograd::THPVariable_addmm(_object*, _object*, _object*) ()
    at /opt/conda/conda-bld/pytorch-cpu_1524582300956/work/torch/lib/tmp_install/include/ATen/TensorMethods.h:682
#18 0x00007ffff7e4eb94 in _PyCFunction_FastCallDict ()
#19 0x00007ffff7ede67c in call_function ()
#20 0x00007ffff7f00cba in _PyEval_EvalFrameDefault ()
#21 0x00007ffff7ed7a94 in _PyEval_EvalCodeWithName ()
#22 0x00007ffff7ed8941 in fast_function ()
#23 0x00007ffff7ede755 in call_function ()
#24 0x00007ffff7f00cba in _PyEval_EvalFrameDefault ()
#25 0x00007ffff7ed8d7b in _PyFunction_FastCallDict ()
#26 0x00007ffff7e4ef5f in _PyObject_FastCallDict ()
#27 0x00007ffff7e53a03 in _PyObject_Call_Prepend ()
#28 0x00007ffff7e4e99e in PyObject_Call ()
#29 0x00007ffff7f02470 in _PyEval_EvalFrameDefault ()
#30 0x00007ffff7ed7a94 in _PyEval_EvalCodeWithName ()
#31 0x00007ffff7ed8e1b in _PyFunction_FastCallDict ()
#32 0x00007ffff7e4ef5f in _PyObject_FastCallDict ()
#33 0x00007ffff7e53a03 in _PyObject_Call_Prepend ()
#34 0x00007ffff7e4e99e in PyObject_Call ()
#35 0x00007ffff7eab9b7 in slot_tp_call ()
#36 0x00007ffff7e4ed7b in _PyObject_FastCallDict ()
#37 0x00007ffff7ede7ce in call_function ()
#38 0x00007ffff7f00cba in _PyEval_EvalFrameDefault ()
#39 0x00007ffff7ed8d7b in _PyFunction_FastCallDict ()
#40 0x00007ffff7e4ef5f in _PyObject_FastCallDict ()
#41 0x00007ffff7e53a03 in _PyObject_Call_Prepend ()
#42 0x00007ffff7e4e99e in PyObject_Call ()
#43 0x00007ffff7f02470 in _PyEval_EvalFrameDefault ()
#44 0x00007ffff7ed7a94 in _PyEval_EvalCodeWithName ()
#45 0x00007ffff7ed8e1b in _PyFunction_FastCallDict ()
#46 0x00007ffff7e4ef5f in _PyObject_FastCallDict ()
#47 0x00007ffff7e53a03 in _PyObject_Call_Prepend ()
#48 0x00007ffff7e4e99e in PyObject_Call ()
#49 0x00007ffff7eab9b7 in slot_tp_call ()
#50 0x00007ffff7e4ed7b in _PyObject_FastCallDict ()
#51 0x00007ffff7ede7ce in call_function ()
#52 0x00007ffff7f00cba in _PyEval_EvalFrameDefault ()
#53 0x00007ffff7ed9459 in PyEval_EvalCodeEx ()
#54 0x00007ffff7eda1ec in PyEval_EvalCode ()
#55 0x00007ffff7f549a4 in run_mod ()
#56 0x00007ffff7f54da1 in PyRun_FileExFlags ()
#57 0x00007ffff7f54fa4 in PyRun_SimpleFileExFlags ()
#58 0x00007ffff7f58a9e in Py_Main ()
#59 0x00007ffff7e204be in main ()

Are you using multiprocessing? I’ve run into deadlock issues with multiprocessing and OpenMP. Here are a few things to try:

  1. Try adding import multprocessing; multiprocessing.set_start_method('spawn') at the very beginning of your program. The default (“fork”) can have problems with threads.

  2. Try running with the environment variable OMP_NUM_THREADS=1. This may be slower, but I think should avoid the OpenMP deadlocks.

Thank you, setting OMP_NUM_THREADS=1 solves the problem.

This means that pytorch is multiprocessing by default then? And do you know what could cause it to get into a deadlock on that basic tutorial?

PyTorch uses OpenMP by default (not multiprocessing). I see your backtrace the conda environment is named “fairseq”. I mentioned multiprocessing because I know the fairseq project uses multiprocessing.

I’m not sure why you got a deadlock.

You are right, I originally ran into this issue while debugging fairseq, but oddly enough I get the same deadlock while running the simple pytorch example here.

Hi. I’m facing a similar deadlock issue but I’m not using multiprocessing. I’ve posted the issue along with the gdb output at: https://discuss.pytorch.org/t/tensor-multiplication-hangs/18673 Any pointers would be greatly appreciated.