I’m able to compile PyTorch on Ubuntu 24.04 (with tools/amd_build/build_amd.py applied), but no matter what I try, USE_ROCM is disabled.
(venv) USER@WORKSTATION:~/dev/repositories/public/pytorch$ python setup.py clean
(venv) USER@WORKSTATION:~/dev/repositories/public/pytorch$ USE_ROCM=1 RCCL_DIR=/opt/rocm/lib/cmake/rccl PYTORCH_ROCM_ARCH=gfx1032 hip_DIR=/opt/rocm/lib/cmake/hip USE_NVCC=OFF BUILD_CAFFE2_OPS=0 PATH=/usr/lib/ccache/:$PATH USE_CUDA=OFF python setup.py bdist_wheel
...
disabling ROCM because NOT USE_ROCM is set
...
– USE_ROCM : OFF
I have installed amdgpu driver with rocm version 7.2.
# dpkg -l | grep amdgpu
ii amdgpu-core 1:7.2.70200-2278374.22.04 all Core meta package for unified amdgpu driver.
ii amdgpu-dkms 1:6.16.13.30300000-2278356.22.04 all amdgpu driver in DKMS format.
ii amdgpu-dkms-firmware 30.30.0.0.30300000-2278356.22.04 all firmware blobs used by amdgpu driver in DKMS format
ii amdgpu-install 30.30.0.0.30300000-2278356.22.04 all AMDGPU driver repository and installer
ii amdgpu-lib 1:7.2.70200-2278374.22.04 amd64 Meta package to install amdgpu userspace components.
ii amdgpu-lib32 1:7.2.70200-2278374.22.04 amd64 Meta package to support i386 runtime on amd64 architecture
ii amdgpu-multimedia 1:7.2.70200-2278374.22.04 amd64 Meta package to install mesa multimedia components.
ii libdrm-amdgpu-amdgpu1:amd64 1:2.4.125.70200-2278374.22.04 amd64 Userspace interface to amdgpu-specific kernel DRM services -- runtime
ii libdrm-amdgpu-amdgpu1:i386 1:2.4.125.70200-2278374.22.04 i386 Userspace interface to amdgpu-specific kernel DRM services -- runtime
ii libdrm-amdgpu-common 1.0.0.70200-2278374.22.04 all List of AMD/ATI cards' device IDs, revision IDs and marketing names
ii libdrm-amdgpu-dev:amd64 1:2.4.125.70200-2278374.22.04 amd64 Userspace interface to kernel DRM services -- development files
ii libdrm-amdgpu-radeon1:amd64 1:2.4.125.70200-2278374.22.04 amd64 Userspace interface to radeon-specific kernel DRM services -- runtime
ii libdrm-amdgpu1:amd64 2.4.125-1ubuntu0.1~24.04.1 amd64 Userspace interface to amdgpu-specific kernel DRM services -- runtime
ii libdrm2-amdgpu:amd64 1:2.4.125.70200-2278374.22.04 amd64 Userspace interface to kernel DRM services -- runtime
ii libdrm2-amdgpu:i386 1:2.4.125.70200-2278374.22.04 i386 Userspace interface to kernel DRM services -- runtime
ii libegl1-amdgpu-mesa:amd64 1:26.0.0.70200-2278374.22.04 amd64 free implementation of the EGL API -- Mesa vendor library
ii libegl1-amdgpu-mesa:i386 1:26.0.0.70200-2278374.22.04 i386 free implementation of the EGL API -- Mesa vendor library
ii libegl1-amdgpu-mesa-drivers:amd64 1:26.0.0.70200-2278374.22.04 amd64 free implementation of the EGL API -- hardware drivers
ii libegl1-amdgpu-mesa-drivers:i386 1:26.0.0.70200-2278374.22.04 i386 free implementation of the EGL API -- hardware drivers
ii libgbm1-amdgpu:amd64 1:26.0.0.70200-2278374.22.04 amd64 generic buffer management API -- runtime
ii libgbm1-amdgpu:i386 1:26.0.0.70200-2278374.22.04 i386 generic buffer management API -- runtime
ii libgl1-amdgpu-mesa-dri:amd64 1:26.0.0.70200-2278374.22.04 amd64 free implementation of the OpenGL API -- DRI modules
ii libgl1-amdgpu-mesa-dri:i386 1:26.0.0.70200-2278374.22.04 i386 free implementation of the OpenGL API -- DRI modules
ii libgl1-amdgpu-mesa-glx:amd64 1:26.0.0.70200-2278374.22.04 amd64 free implementation of the OpenGL API -- GLX runtime
ii libgl1-amdgpu-mesa-glx:i386 1:26.0.0.70200-2278374.22.04 i386 free implementation of the OpenGL API -- GLX runtime
ii libllvm20.1-amdgpu:amd64 1:20.1.70200-2278374.22.04 amd64 Modular compiler and toolchain technologies, runtime library
ii libllvm20.1-amdgpu:i386 1:20.1.70200-2278374.22.04 i386 Modular compiler and toolchain technologies, runtime library
ii libwayland-amdgpu-client0:amd64 1.24.0.70200-2278374.22.04 amd64 wayland compositor infrastructure - client library
ii libwayland-amdgpu-client0:i386 1.24.0.70200-2278374.22.04 i386 wayland compositor infrastructure - client library
ii libwayland-amdgpu-server0:amd64 1.24.0.70200-2278374.22.04 amd64 wayland compositor infrastructure - server library
ii libwayland-amdgpu-server0:i386 1.24.0.70200-2278374.22.04 i386 wayland compositor infrastructure - server library
ii llvm-amdgpu 1:20.1.70200-2278374.22.04 amd64 Low-Level Virtual Machine (LLVM)
ii llvm-amdgpu-20.1 1:20.1.70200-2278374.22.04 amd64 Modular compiler and toolchain technologies
ii llvm-amdgpu-20.1-dev 1:20.1.70200-2278374.22.04 amd64 Modular compiler and toolchain technologies, libraries and headers
ii llvm-amdgpu-20.1-runtime 1:20.1.70200-2278374.22.04 amd64 Modular compiler and toolchain technologies, IR interpreter
ii llvm-amdgpu-runtime 1:20.1.70200-2278374.22.04 amd64 Low-Level Virtual Machine (LLVM), bytecode interpreter
ii mesa-amdgpu-libgallium:amd64 1:26.0.0.70200-2278374.22.04 amd64 shared infrastructure for Mesa drivers
ii mesa-amdgpu-libgallium:i386 1:26.0.0.70200-2278374.22.04 i386 shared infrastructure for Mesa drivers
ii mesa-amdgpu-va-drivers:amd64 1:26.0.0.70200-2278374.22.04 amd64 Mesa VA-API video acceleration drivers
ii mesa-amdgpu-va-drivers:i386 1:26.0.0.70200-2278374.22.04 i386 Mesa VA-API video acceleration drivers
ii xserver-xorg-video-amdgpu 23.0.0-1ubuntu0.24.04.1 amd64 X.Org X server -- AMDGPU display driver
# dpkg -l | grep rocm
ii rocm 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) software stack meta package
ii rocm-cmake 0.14.0.70200-43~22.04 amd64 rocm-cmake built using CMake
ii rocm-core 7.2.0.70200-43~22.04 amd64 ROCm Runtime software stack
ii rocm-dbgapi 0.77.4.70200-43~22.04 amd64 Library to provide AMD GPU debugger API
ii rocm-debug-agent 2.1.0.70200-43~22.04 amd64 Radeon Open Compute Debug Agent (ROCdebug-agent)
ii rocm-developer-tools 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-device-libs 1.0.0.70200-43~22.04 amd64 Radeon Open Compute - device libraries
ii rocm-gdb 16.3.70200-43~22.04 amd64 ROCgdb
ii rocm-hip 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-hip-libraries 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-hip-runtime 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-hip-runtime-dev 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-hip-sdk 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-language-runtime 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-llvm 22.0.0.26014.70200-43~22.04 amd64 ROCm core compiler
ii rocm-ml-libraries 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-ml-sdk 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-opencl 2.0.0.70200-43~22.04 amd64 clr built using CMake
ii rocm-opencl-dev 2.0.0.70200-43~22.04 amd64 clr built using CMake
ii rocm-opencl-sdk 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack
ii rocm-openmp 7.2.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) OpenMP Software development Kit.
ii rocm-smi-lib 7.8.0.70200-43~22.04 amd64 AMD System Management libraries
ii rocminfo 1.0.0.70200-43~22.04 amd64 Radeon Open Compute (ROCm) Runtime rocminfo tool
$ rocminfo
ROCk module version 6.16.13 is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.18
Runtime Ext Version: 1.15
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
XNACK enabled: NO
DMAbuf Support: YES
VMM Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 9 5900X 12-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 5900X 12-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3700
BDFID: 0
Internal Node ID: 0
Compute Unit: 24
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65738444(0x3eb16cc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 65738444(0x3eb16cc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 65738444(0x3eb16cc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65738444(0x3eb16cc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1032
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6600 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 2048(0x800) KB
L3: 32768(0x8000) KB
Chip ID: 29695(0x73ff)
ASIC Revision: 0(0x0)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2900
BDFID: 3072
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 2147483647(0x7fffffff)
y 65535(0xffff)
z 65535(0xffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 132
SDMA engine uCode:: 76
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1032
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 2147483647(0x7fffffff)
y 65535(0xffff)
z 65535(0xffff)
FBarrier Max Size: 32
ISA 2
Name: amdgcn-amd-amdhsa--gfx10-3-generic
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 2147483647(0x7fffffff)
y 65535(0xffff)
z 65535(0xffff)
FBarrier Max Size: 32
*** Done ***
(venv) USER@WORKSTATION:~$ python test-rocm.py
Checking ROCM support...
GOOD: ROCM devices found: 2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user nils is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of:
---> AMD Ryzen 9 5900X 12-Core Processor
---> gfx1032
I’ve already tried to apply the information I found at https://lernapparat.de/pytorch-rocm/ (adjusting pathes), but it didn’t help.