Issue: Kernel dies when fitting Pytorch model Linux Mint

Could you export the notebook as a script and run it in your terminal?
This will most likely return an error message instead of just a kernel restart.

Like this?
Ok,I have this err,but what the?
There is no any errors on Win

I assume you are running and editing your notebook in a browser. You can export it via:

File -> Download as -> Python (.py)

Alternatively you might use

jupyter nbconvert --to script your_notebook.ipynb
1 Like

Translate from russian - Invalid instruction (the memory stack is flushed to disk)

Could you run your script with pdb to get the stack trace?
The error message would probably translate to illegal instruction (core dumped).

I’m sorry for so much screenshoots, but here everything

This sounds similar to Unable to sum the result of an equality test.

Do you know what model CPU you have on the Linux Mint machine?

Also, it looks like you have both the nightly PyTorch build (0.5.0a0) and PyTorch-CPU (0.4.1) installed. I’m not sure which version you are running. Can you uninstall the older pytorch-cpu build?

conda uninstall pytorch-cpu

cpu - AMD A6-6310

after I uninstalled pytorch-cpu python can’t found torch.nn

I read the topic about Unable to sum the result of an equality test
So on my cpu pytorch can not be started?

Your CPU should be OK. It looks like there is a bug in PyTorch, but I am not sure which PyTorch version you are using.

Please try the following. First fully uninstall PyTorch:

conda uninstall -y pytorch-cpu
conda uninstall -y pytorch

Next try the nightly CPU build from yesterday:

pip install

Please let me know if this works.

Unfortunately, it didn’t help.Again the same error

Can you try running your script under gdb and report the backtrace?

$ gdb --args python
Reading symbols from python...done.
(gdb) run
(gdb) backtrace

Thanks, this is very helpful. Can you also run disas and report the output?

$ gdb --args python
Reading symbols from python...done.
(gdb) run
(gdb) disas

OK, it looks like the FMA4 vmfaddps instruction is the problem. I’m a bit confused because your CPU should support that instruction.

  1. Can you report the cpu flags: grep flags < /proc/cpuinfo
  2. Can you report the kernel version: uname -a
  3. Are you running in a virtual machine?

About VM - no, I don’t

so? Any results for this problem?

No. I’m still not sure if your CPU is supposed to support the FMA4 instructions. I see conflicting information on AMD’s website. I’m asking AMD engineers about this. In the meantime, can you download, compile, and run this program which prints out information about your CPU:

curl > cpuid.c
gcc cpuid.c