Issue: Kernel dies when fitting Pytorch model Linux Mint


Could you export the notebook as a script and run it in your terminal?
This will most likely return an error message instead of just a kernel restart.

(Artemiy) #3

Like this?
Ok,I have this err,but what the?
There is no any errors on Win


I assume you are running and editing your notebook in a browser. You can export it via:

File -> Download as -> Python (.py)

Alternatively you might use

jupyter nbconvert --to script your_notebook.ipynb

(Artemiy) #5

Translate from russian - Invalid instruction (the memory stack is flushed to disk)


Could you run your script with pdb to get the stack trace?
The error message would probably translate to illegal instruction (core dumped).

(Artemiy) #7

I’m sorry for so much screenshoots, but here everything

(colesbury) #8

This sounds similar to Unable to sum the result of an equality test.

Do you know what model CPU you have on the Linux Mint machine?

(colesbury) #9

Also, it looks like you have both the nightly PyTorch build (0.5.0a0) and PyTorch-CPU (0.4.1) installed. I’m not sure which version you are running. Can you uninstall the older pytorch-cpu build?

conda uninstall pytorch-cpu

(Artemiy) #10

cpu - AMD A6-6310

after I uninstalled pytorch-cpu python can’t found torch.nn

(Artemiy) #11

I read the topic about Unable to sum the result of an equality test
So on my cpu pytorch can not be started?

(colesbury) #13

Your CPU should be OK. It looks like there is a bug in PyTorch, but I am not sure which PyTorch version you are using.

Please try the following. First fully uninstall PyTorch:

conda uninstall -y pytorch-cpu
conda uninstall -y pytorch

Next try the nightly CPU build from yesterday:

pip install

Please let me know if this works.

(Artemiy) #15

Unfortunately, it didn’t help.Again the same error

(colesbury) #16

Can you try running your script under gdb and report the backtrace?

$ gdb --args python
Reading symbols from python...done.
(gdb) run
(gdb) backtrace

Simple test for mixed precision on RTX 2070?
A potential error of torch.zeros function
How to get pytorch C++ crash callstack?
(Artemiy) #17

(colesbury) #18

Thanks, this is very helpful. Can you also run disas and report the output?

$ gdb --args python
Reading symbols from python...done.
(gdb) run
(gdb) disas

(Artemiy) #19

(colesbury) #20

OK, it looks like the FMA4 vmfaddps instruction is the problem. I’m a bit confused because your CPU should support that instruction.

  1. Can you report the cpu flags: grep flags < /proc/cpuinfo
  2. Can you report the kernel version: uname -a
  3. Are you running in a virtual machine?

(Artemiy) #21

About VM - no, I don’t

(Artemiy) #22

so? Any results for this problem?

(colesbury) #23

No. I’m still not sure if your CPU is supposed to support the FMA4 instructions. I see conflicting information on AMD’s website. I’m asking AMD engineers about this. In the meantime, can you download, compile, and run this program which prints out information about your CPU:

curl > cpuid.c
gcc cpuid.c