Bus error (core dumped) keep happening during training

I keep getting the error message ‘Bus error (core dumped)’ at random times during training.

I’m running Ubuntu 18.04 with anaconda, conda 4.8.1, Python 3.7.4 and pytorch 1.4

I have a GTX 2080 Ti, and it only uses about 3 GB of RAM, so it shouldn’t be that.

Does anyone have a suggestions for what to try?

Generally speaking, SIGILL error occurs when the binary contains instructions not supported by CPU, because the installation method of the official website is binary, that is to say, the official binary does not match my CPU (AMD instead of Intel), so it needs to be installed from source, as follows

git clone https://github.com/pytorch/pytorch

sudo conda uninstall pytorch torchvision -c soumith

sudo python3 setup.py install

I come across the same question, could you solve it? Thanks in advance.