Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Mactarvish · May 7, 2018, 2:10am

I tried to train my model just now and it just stopped with such an error:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

but in debug mode everything can run normally, and yesterday it was everything OK and I haven’t change it. What’s the problem?

Mactarvish · May 7, 2018, 2:32am

It’s extremely strange for me. I just added a cuda check in the top of my code like that:

import ......

print(torch.cuda.is_available())

...
...
...

And the error vanished with my code running normally… Does anyone know what’s the matter? And I never add such a check before when it could still run normally.

albanD · May 7, 2018, 8:52am

Hi,

Could you give a small code sample that reproduce this issue?

Nabarun_Goswami · May 7, 2018, 10:11am

Happened to me too. But it seems to occur randomly. Once after 2 epochs, and once after 33. Also I tried executing the script inside gdb to get a stacktrace of the crash, it didn’t segfault for the entire training. Really confused here as well.

BTW, I am on 0.5.0a0+8fbab83 on a TitanX pascal with cuda 8.0

Arnold_MuST · August 14, 2018, 12:32pm

Hi Mactarvish

I have exact same problem as you. Just imported torch and torch.cuda in the console to see if cuda is available by using torch.cuda.is_available() and now no PyCharm project is working. Have you found a solution to this?

(EDIT)

Turns out the problem was my NVIDIA drivers, i switched back to Intel drivers and it worked fine (after I reinstalled conda and pytorch which was a drag.)

ilinaqin · May 17, 2019, 7:25am

Hi，have you solved this problem？When I test a model after I’ve trained it, it always comes up randomly. If I restart the computer or test it a few more times, it maybe effective.But it won’t always work out.

Pinocchioo · December 17, 2020, 11:16am

Wow,it is so strange for it ,and it is same happen to me . When I ran my code that was OK before somedays , the error is : Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).
And I just added some cuda check :

import torch 
print(torch.cuda.is_available())

And the code could runing normally … It is so strange and I want to know whether anyone know why？

EvanZ · June 18, 2021, 5:53pm

Hmm I’m having this issue all of a sudden where my training is suddenly throwing these errors:

Using cuda
  1%|▏         | 21/1584 [00:00<01:00, 26.03it/s]
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

It seems like it gets through some iterations and then bam, segmentation fault.

I checked nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03   Driver Version: 450.119.03   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   50C    P5    21W / 250W |   4867MiB / 11177MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Having just upgraded to torch 1.9.0, is it possible my gpu drivers need to be updated?

Doge_Coin · September 21, 2023, 3:45am

I Fixed the Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) error

the problem is with import cv2. you need to use pip install numpy==1.24.3

this version of numpy fixed it. the newest version of numpy must be broken