Segmentation fault (core dumped) in torch 2.1.0

Hello
I’m encountering a problem while running a mamba-asr model. The script crashes with the message ‘Segmentation fault (core dumpted)’ after a few steps. This error seems to occur ramdomly. Additionally, another error occurs during training, namely an error like “runtimeError: d.is_cuda() INTERNAL ASSERT FAILED at “…/c10/cuda/impl/CUDAGuardImpl.h”:31, please report a bug to PyTorch.”

Here are my settings:
Ubuntu 22.04
Cuda 12.1
GPU : A6000 x2 49GB
PyTorch 2.1.0

Is there any solution to solve this problem?

Do you see these issues in the latest stable or nightly releases?

No, I haven’t tried it yet. However, when I run this code in a single GPU (NVIDIA RTX 4060) environment, no errors occurred at all. Could this be caused by incorrect connection when connecting two A6000s?