Setup.py says USE_ROCM:OFF, while I set USE_ROCM=1

So I use the AMD-provided deb repository from the AMD. Obviously, I am prone to using this recipe for building.

From the error message, you probably don’t have a full rocm installation. For me, the clang is in the llvm-amdgpu package:

$ dpkg -S /opt/rocm-4.0.0/llvm/bin/clang++
llvm-amdgpu: /opt/rocm-4.0.0/llvm/bin/clang++

One thing AMD doesn’t manage in an ideal way in this repo is upgrades, and specifically, they don’t bump version numbers properly, so the llvm-amdgpu package is at version 12.0.dev and has been in at least rocm 3.10, too. As a result apt-get upgrade or dist-upgrade won’t pick it up.
The official instructions say you need to remove and re-install but in my experience it is in fact sufficient to pass --reinstall to apt (apt-get install --reinstall $(dpkg -S /opt/rocm-*/ | sed 's/,//g;s/:.*//')) or so). Looking which packages still refer to the old path (dpkg -S /opt/rocm-3.10 or so ) is a good way to find out what has been left behind and needs a reinstallation.

Best regards

Thomas

P.S.: If you know the secret handshake, you can also pip install nightlies from
https://download.pytorch.org/whl/nightly/rocm4.0/torch_nightly.html

1 Like