Segmentation fault (Core dump) when using GPU

(Elliothe) #1

Recently, the GPU driver on the server is updated to 390.12 by the IT support, then I also update the CUDA9 and cudnn library corresponding. However, since then I started to get segmentation error once I call .cuda() function. I attach the following stack traces. The example I use is the official mnist example:

gdb python
(gdb) r 
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffa8a2a700 (LWP 34379)]
0x00007fffeac828d5 in ?? () from /usr/lib64/nvidia/
(gdb) bt
#0  0x00007fffeac828d5 in ?? () from /usr/lib64/nvidia/
#1  0x00007fffeadd2914 in ?? () from /usr/lib64/nvidia/
#2  0x00007fffead6ee80 in ?? () from /usr/lib64/nvidia/
#3  0x00007ffff7bc6e25 in start_thread () from /lib64/
#4  0x00007ffff78f434d in clone () from /lib64/

(Simon Wang) #2

did you reinstall with the cuda 9 version?

(Elliothe) #3

Yes, I install a local anaconda 3 and install the cuda 9 version pytorch.