Issue: Kernel dies when fitting Pytorch model Linux Mint

Issue description

Hello there.I’m using Jupyter Notebook, Linux Mint x64 and I have a huge problem with Pytorch. When I running this code - kernel dies.Perfect running on Win 8.1 x64,no problem. I tryed different types of installing - pip,conda, source code from github.Nothing. Please, explain me why this happen all times and how to fix it

Code example

import torch
from torch import nn
import torch.nn.functional as F
from notmnist import load_notmnist
X_train, y_train, X_test, y_test = load_notmnist(letters='AB')
X_train, X_test = X_train.reshape([-1, 784]), X_test.reshape([-1, 784])
model = nn.Sequential()
model.add_module('l1', nn.Linear(784, 1))
model.add_module('l2', nn.Sigmoid())

x = torch.tensor(X_train[:3], dtype=torch.float32)
y = torch.tensor(y_train[:3], dtype=torch.float32)
y_predicted = model(x)[:, 0]

System Info


PyTorch version: 0.5.0a0+b640264
Is debug build: No
CUDA used to build PyTorch: None

OS: Linux Mint 19 Tara
GCC version: (Ubuntu 7.3.0-16ubuntu3) 7.3.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip] Could not collect
[conda] pytorch-cpu 0.4.1 py36_cpu_1 pytorch
[conda] torch 0.5.0a0+b640264
[conda] torchvision-cpu 0.2.1 py36_1 pytorch
If you need any logs - please tell me where I can found it

Could you export the notebook as a script and run it in your terminal?
This will most likely return an error message instead of just a kernel restart.

Like this?
Ok,I have this err,but what the?
There is no any errors on Win
1

I assume you are running and editing your notebook in a browser. You can export it via:

File -> Download as -> Python (.py)

Alternatively you might use

jupyter nbconvert --to script your_notebook.ipynb
1 Like

Translate from russian - Invalid instruction (the memory stack is flushed to disk)

Could you run your script with pdb to get the stack trace?
The error message would probably translate to illegal instruction (core dumped).

I’m sorry for so much screenshoots, but here everything





This sounds similar to Unable to sum the result of an equality test.

Do you know what model CPU you have on the Linux Mint machine?

Also, it looks like you have both the nightly PyTorch build (0.5.0a0) and PyTorch-CPU (0.4.1) installed. I’m not sure which version you are running. Can you uninstall the older pytorch-cpu build?

conda uninstall pytorch-cpu

cpu - AMD A6-6310

after I uninstalled pytorch-cpu python can’t found torch.nn

I read the topic about Unable to sum the result of an equality test
So on my cpu pytorch can not be started?

Your CPU should be OK. It looks like there is a bug in PyTorch, but I am not sure which PyTorch version you are using.

Please try the following. First fully uninstall PyTorch:

conda uninstall -y pytorch-cpu
conda uninstall -y pytorch

Next try the nightly CPU build from yesterday:

pip install https://download.pytorch.org/whl/nightly/cpu/torch_nightly-2018.8.14.dev1-cp36-cp36m-linux_x86_64.whl

Please let me know if this works.

Unfortunately, it didn’t help.Again the same error

Can you try running your script under gdb and report the backtrace?

$ gdb --args python my_script.py
...
Reading symbols from python...done.
(gdb) run
...
(gdb) backtrace
...



Thanks, this is very helpful. Can you also run disas and report the output?

$ gdb --args python my_script.py
...
Reading symbols from python...done.
(gdb) run
...
(gdb) disas
...

OK, it looks like the FMA4 vmfaddps instruction is the problem. I’m a bit confused because your CPU should support that instruction.

  1. Can you report the cpu flags: grep flags < /proc/cpuinfo
  2. Can you report the kernel version: uname -a
  3. Are you running in a virtual machine?

About VM - no, I don’t

so? Any results for this problem?