Romc Linux Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

I use 6700xt, the ROCM version is the latest 5.4.3, PyTorch is 0.15.1+ROCM5.4.2, and the official instance Fashionmnist reports an error.



Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])

Shape of y: torch.Size([64]) torch.int64

Using cuda device

Epoch 1

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
Who can help me wo solve this issue?

It seems you are encountering a segmentation fault (SIGSEGV) error, which can be caused by various reasons, such as memory access issues, incorrect library versions, or problems with the underlying system. To troubleshoot this issue, consider the following steps:

  1. Update your PyTorch version: You mentioned that you are using PyTorch 0.15.1+ROCM5.4.2, which is an older version. Update your PyTorch installation to the latest version compatible with ROCm by following the official installation guide for ROCm: Deep Learning — ROCm 4.5.0 documentation
  2. Check for GPU compatibility: Ensure that your AMD Radeon RX 6700 XT is compatible with the ROCm version you are using.
  3. Run a ROCm validation test: Make sure your ROCm installation is working correctly. You can do this by running the following command in your terminal:

Thank you! I have solved this issue.

I forgot to add the information below.
export HSA_OVERRIDE_GFX_VERSION=10.3.0 # RX6700xt
export LD_LIBRARY_PATH=/opt/rocm/lib