Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

I tried to train my model just now and it just stopped with such an error:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

but in debug mode everything can run normally, and yesterday it was everything OK and I haven’t change it. What’s the problem?

1 Like

It’s extremely strange for me. I just added a cuda check in the top of my code like that:

import ......

print(torch.cuda.is_available())

...
...
...

And the error vanished with my code running normally… Does anyone know what’s the matter? And I never add such a check before when it could still run normally.

Hi,

Could you give a small code sample that reproduce this issue?

Happened to me too. But it seems to occur randomly. Once after 2 epochs, and once after 33. Also I tried executing the script inside gdb to get a stacktrace of the crash, it didn’t segfault for the entire training. Really confused here as well.

BTW, I am on 0.5.0a0+8fbab83 on a TitanX pascal with cuda 8.0

1 Like

Hi Mactarvish

I have exact same problem as you. Just imported torch and torch.cuda in the console to see if cuda is available by using torch.cuda.is_available() and now no PyCharm project is working. Have you found a solution to this?

(EDIT)

Turns out the problem was my NVIDIA drivers, i switched back to Intel drivers and it worked fine (after I reinstalled conda and pytorch which was a drag.)

Hi,have you solved this problem?When I test a model after I’ve trained it, it always comes up randomly. If I restart the computer or test it a few more times, it maybe effective.But it won’t always work out.

Wow,it is so strange for it ,and it is same happen to me . When I ran my code that was OK before somedays , the error is : Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).
And I just added some cuda check :

import torch 
print(torch.cuda.is_available())

And the code could runing normally … It is so strange and I want to know whether anyone know why?

1 Like

Hmm I’m having this issue all of a sudden where my training is suddenly throwing these errors:

Using cuda
  1%|▏         | 21/1584 [00:00<01:00, 26.03it/s]
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

It seems like it gets through some iterations and then bam, segmentation fault.

I checked nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03   Driver Version: 450.119.03   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   50C    P5    21W / 250W |   4867MiB / 11177MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Having just upgraded to torch 1.9.0, is it possible my gpu drivers need to be updated?