[solved] Segmentation fault (core dumped) when using DataParallel

Hi, I’m trying to run the code in the tutorial:
http://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html
but I got a Segmentation fault when trying to run it on multiple GPUs in a single node.

Starting program: …/anaconda3/bin/python data_parallel_tutorial.py
[Thread debugging using libthread_db enabled]
Missing separate debuginfo for …/anaconda3/lib/python3.6/site-packages/numpy/…/…/…/libiomp5.so
Detaching after fork from child process 209234.
[New Thread 0x2aaaf5418700 (LWP 209238)]
Let’s use 4 GPUs!
[New Thread 0x2aaaf5619700 (LWP 209239)]
[New Thread 0x2aabf5c00700 (LWP 209243)]
[New Thread 0x2aad04c00700 (LWP 209244)]
[New Thread 0x2aae0cc00700 (LWP 209245)]
[New Thread 0x2aaf164bd700 (LWP 209247)]
[New Thread 0x2aaf4c000700 (LWP 209248)]
[New Thread 0x2aaf4c201700 (LWP 209249)]
[New Thread 0x2aaf4c402700 (LWP 209250)]
[New Thread 0x2aaf4c603700 (LWP 209251)]
In Model: input size torch.Size([8, 5]) output size torch.Size([8, 2])

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaf4c000700 (LWP 209248)]
__mempcpy_ssse3 () at …/sysdeps/x86_64/multiarch/memcpy-ssse3.S:2836
2836 …/sysdeps/x86_64/multiarch/memcpy-ssse3.S: No such file or directory.
in …/sysdeps/x86_64/multiarch/memcpy-ssse3.S
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.el6_9.2.x86_64 libibverbs-devel-1.1.8-4.el6.x86_64 libipathverbs-1.3-3.el6.x86_64 libmlx4-1.0.6-7.el6.x86_64 libmlx5-1.0.2-1.el6.x86_64
libnl-1.1.4-2.el6.x86_64 sssd-client-1.13.3-57.el6_9.x86_64

When I set CUDA_VISIBLE_DEVICES=0 (or any other single GPU), the code could run without any error.

The problem could be related to the environment, but I have no idea how to fix it. Any help would be appreciated.

Solved by re-installing from source.