I was trying to install vllm, which will compile torch, in a supercomputer. However, I got this error message:
(venv) liyumin@setonix-01:~/scratch/DeepSeek-V3/inference/vllm> cmake /scratch/pawsey1001/liyumin/DeepSeek-V3/inference/vllm -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DVLLM_TARGET_DEVICE=cuda -DVLLM_PYTHON_EXECUTABLE=/usr/bin/python3 -DVLLM_PYTHON_PATH=/home/liyumin/.local/lib/python3.10/site-packages:/scratch/pawsey1001/liyumin/DeepSeek-V3/inference/vllm:/software/setonix/2023.08/containers/modules-long/quay.io/pawsey/pytorch/2.2.0-rocm5.7.3/bin/python -DFETCHCONTENT_BASE_DIR=/scratch/pawsey1001/liyumin/DeepSeek-V3/inference/vllm/.deps -DCMAKE_JOB_POOL_COMPILE:STRING=compile -DCMAKE_JOB_POOLS:STRING=compile=256 -DCMAKE_MODULE_PATH=/opt/rocm/hip/ -DHIP_ROOT_DIR=/opt/rocm
-- Build type: RelWithDebInfo
-- Target device: cuda
-- Found python matching: /usr/bin/python3.
Building PyTorch for GPU arch: gfx90a
-- Could NOT find HIP: Found unsuitable version "0.0.0", but required is at least "1.0" (found /opt/rocm)
HIP VERSION: 0.0.0
CMake Warning at /home/liyumin/.local/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/home/liyumin/.local/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
CMakeLists.txt:81 (find_package)
Related information:
(venv) liyumin@setonix-01:~/scratch/DeepSeek-V3/inference/vllm> hipcc --version
HIP version: 5.7.31921-1949b1621
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.1 23382 f3e174a1d286158c06e4cc8276366b1d4bc0c914)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/llvm/bin
(venv) liyumin@setonix-01:~/scratch/DeepSeek-V3/inference/vllm> hipconfig
HIP version : 5.7.31921-1949b1621
== hipconfig
HIP_PATH : /opt/rocm
ROCM_PATH : /opt/rocm
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME : rocclr
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm/include -I/opt/rocm-5.7.1/llvm/lib/clang/17.0.0
== hip-clang
HIP_CLANG_PATH : /opt/rocm/llvm/bin
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.1 23382 f3e174a1d286158c06e4cc8276366b1d4bc0c914)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/llvm/bin
AMD LLVM version 17.0.0git
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: znver3
Registered Targets:
amdgcn - AMD GCN GPUs
r600 - AMD GPUs HD2XXX-HD6XXX
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags : -isystem "/opt/rocm/include" -O3 --hip-path="/opt/rocm"
hip-clang-ldflags : -O3 --hip-path="/opt/rocm" --hip-link --rtlib=compiler-rt -unwindlib=libgcc
=== Environment Variables
PATH=/opt/rocm/bin:/home/liyumin/.local/bin:/software/setonix/2023.08/containers/modules-long/quay.io/pawsey/pytorch/2.2.0-rocm5.7.3/bin/:/software/projects/pawsey1001/liyumin/venv/bin:/software/setonix/2024.05/pawsey/software/shpc/lib/python3.11/site-packages/modules/quay.io/pawsey/pytorch/2.2.0-rocm5.7.3/bin:/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/cryptsetup-2.3.5-rz5tb6ah4pfy4s2xix3ayn5wfl5ueykt/sbin:/software/setonix/2024.05/software/linux-sles15-zen3/gcc-12.2.0/singularityce-4.1.0-2gadr2xoc2nb4prnnyq2vvztjh6x4wzl/bin:/opt/cray/pe/mpich/8.1.27/ofi/gnu/9.1/bin:/opt/cray/pe/mpich/8.1.27/bin:/opt/cray/pe/craype/2.7.23/bin:/opt/cray/pe/gcc/12.2.0/snos/bin:/software/pawsey/tools/pawseytools/bin:/opt/cray/pe/perftools/23.09.0/bin:/opt/cray/pe/papi/7.0.1.1/bin:/opt/cray/libfabric/1.15.2.0/bin:/opt/clmgr/sbin:/opt/clmgr/bin:/opt/sgi/sbin:/opt/sgi/bin:/home/liyumin/.local/bin:/usr/local/bin:/usr/bin:/bin:/sbin:/opt/cray/pe/bin
LD_LIBRARY_PATH=/opt/rocm/lib:/opt/cray/pe/papi/7.0.1.1/lib64:/opt/cray/libfabric/1.15.2.0/lib64
HIP_PATH=/opt/rocm
== Linux Kernel
Hostname : setonix-01
Linux setonix-01 5.14.21-150500.55.83_13.0.62-cray_shasta_c #1 SMP Wed Dec 4 02:58:09 UTC 2024 (ac8aa5f) x86_64 x86_64 x86_64 GNU/Linux
How can I fix or debug this error?