Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "12.5")

With CMake version 3.30.2 and CUDA 12.5.1, I get the following error when trying to build PyTorch 2.4.0:

-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "12.5")
CMake Warning at cmake/public/cuda.cmake:31 (message):
  Caffe2: CUDA cannot be found.  Depending on whether you are building Caffe2
  or a Caffe2 dependent library, the next warning / error will give you more
  info.
Call Stack (most recent call first):
  cmake/Dependencies.cmake:43 (include)
  CMakeLists.txt:853 (include)


CMake Warning at cmake/Dependencies.cmake:74 (message):
  Not compiling with CUDA.  Suppress this warning with -DUSE_CUDA=OFF.

Perhaps this is a CMake issue? I didn’t have the problem with CMake 3.29.4, and I’ve since upgraded to 3.30.2.

Might be as cmake==3.30.2 was just released a week ago. Am I understanding your claim correctly that cmake==3.29.4 works for you?

I downgraded CMake to 3.29.4, but I’m still getting the same issue:
disabling CUDA because USE_CUDA is set false
Yet I have USE_CUDA explicitly set: export USE_CUDA=1 and -DUSE_CUDA=ON.

It does say: Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
But then it says disabling CUDA because NOT USE_CUDA is set.

I haven’t seen this issue myself and can find a few references pointing to a wrong docker run command not allowing GPU access. I don’t know if this is related to your issue or not.
You could check if you are able to source build any other CUDA application (e.g. the CUDA samples) first.

Yes. I can CUDA example (which doesn’t use CMake). It seems to be a CMake issue.

Hi, I mentioned this on the issue and this is a noob guess, but could try recompiling after cleaning the build? e.g., add the line right python3 setup.py clean before python3 setup.py install?

Where exactly? This is how the SlackBuild script does it:

cd build
  unshare -n cmake \
    -G Ninja \
    -DCMAKE_C_FLAGS:STRING="$SLKCFLAGS" \
    -DCMAKE_CXX_FLAGS:STRING="$SLKCFLAGS" \
    -DCMAKE_CXX_STANDARD=17 \
    -DCMAKE_INSTALL_PREFIX="/usr" \
    -DLIBSHM_INSTALL_LIB_SUBDIR="lib$LIBDIRSUFFIX" \
    -DTORCH_INSTALL_LIB_DIR="lib$LIBDIRSUFFIX" \
    -DPYTHON_EXECUTABLE=$(which python3) \
    -DBUILD_CUSTOM_PROTOBUF=OFF \
    -DBUILD_TEST=OFF \
    -DUSE_FFMPEG=ON \
    -DUSE_GOLD_LINKER=ON \
    -DUSE_OPENCL=ON \
    -DUSE_OPENCV=ON \
    -DUSE_VULKAN=ON \
    -DCMAKE_BUILD_TYPE=Release ..
  "${NINJA:=ninja}"
  DESTDIR=tmpxxx $NINJA install/strip

  mkdir -p $PKG/usr/{share,lib$LIBDIRSUFFIX}
  mv tmpxxx/usr/bin $PKG/usr
  mv tmpxxx/usr/include $PKG/usr
  mv tmpxxx/usr/share/cmake $PKG/usr/share
  mv tmpxxx/usr/lib$LIBDIRSUFFIX/*.so $PKG/usr/lib$LIBDIRSUFFIX
cd ..
python3 setup.py install --root=$PKG

Adding python3 setup.py clean right before the last line above didn’t change anything. I still get:

Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
and
disabling CUDA because NOT USE_CUDA is set

Hi, did you solve the issue? I also met the same problem.

Not yet.

I should also mention that I fixed the

Building wheel torch-2.4.0a0+gitUnknown
-- Building version 2.4.0a0+gitUnknown
Could not find any of CMakeLists.txt, Makefile, setup.py, LICENSE, LICENSE.md, LICENSE.txt in /tmp/SBo/pytorch-v2.4.0/third_party/QNNPACK
Did you run 'git submodule update --init --recursive'?

by running sed -i '/"QNNPACK"/d' setup.py. I’m wondering if that triggered the CUDA issue I’m having?

I’m still getting this issue when building Pytorch 2.5.1.

I updated to CUDA toolkit 12.6.2 and NVIDIA 560.35.03, and now it works.

Thanks a lot. I’ll check it.

Chat!

Although not the same project, I met the similar issue in vllm which looks like:

running egg_info
writing vllm.egg-info/PKG-INFO
writing dependency_links to vllm.egg-info/dependency_links.txt
writing entry points to vllm.egg-info/entry_points.txt
writing requirements to vllm.egg-info/requires.txt
writing top-level names to vllm.egg-info/top_level.txt
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file 'vllm.egg-info/SOURCES.txt'
running build_ext
-- Build type: RelWithDebInfo
-- Target device: cuda
-- Found python matching: /data1/jyj/micromamba/envs/vllm/bin/python.
-- Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS) (found version "12.4")
CMake Warning at /data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:31 (message):
  PyTorch: CUDA cannot be found.  Depending on whether you are building
  PyTorch or a PyTorch dependent library, the next warning / error will give
  you more info.
Call Stack (most recent call first):
  /data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include)
  /data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:85 (find_package)


CMake Error at /data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:90 (message):
  Your installed Caffe2 version uses CUDA but I cannot find the CUDA
  libraries.  Please set the proper CUDA prefixes and / or install CUDA.
Call Stack (most recent call first):
  /data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:85 (find_package)


-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
  File "/data1/jyj/DengSeek/others/vllm/setup.py", line 676, in <module>
    setup(
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/__init__.py", line 117, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 186, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
    dist.run_commands()
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 983, in run_commands
    self.run_command(cmd)
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/dist.py", line 999, in run_command
    super().run_command(command)
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1002, in run_command
    cmd_obj.run()
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/command/develop.py", line 35, in run
    self.install_for_development()
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/command/develop.py", line 112, in install_for_development
    self.run_command('build_ext')
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 339, in run_command
    self.distribution.run_command(command)
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/dist.py", line 999, in run_command
    super().run_command(command)
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1002, in run_command
    cmd_obj.run()
  File "/data1/jyj/DengSeek/others/vllm/setup.py", line 267, in run
    super().run()
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/command/build_ext.py", line 99, in run
    _build_ext.run(self)
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py", line 365, in run
    self.build_extensions()
  File "/data1/jyj/DengSeek/others/vllm/setup.py", line 226, in build_extensions
    self.configure(ext)
  File "/data1/jyj/DengSeek/others/vllm/setup.py", line 204, in configure
    subprocess.check_call(
  File "/data1/jyj/micromamba/envs/vllm/lib/python3.12/subprocess.py", line 415, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '/data1/jyj/DengSeek/others/vllm', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=/data1/jyj/micromamba/envs/vllm/bin/python', '-DVLLM_PYTHON_PATH=/data1/jyj/DengSeek/others/vllm:/data1/jyj/micromamba/envs/vllm/lib/python312.zip:/data1/jyj/micromamba/envs/vllm/lib/python3.12:/data1/jyj/micromamba/envs/vllm/lib/python3.12/lib-dynload:/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages:/data1/jyj/micromamba/envs/vllm/lib/python3.12/site-packages/setuptools/_vendor', '-DFETCHCONTENT_BASE_DIR=/data1/jyj/DengSeek/others/vllm/.deps', '-DNVCC_THREADS=1']' returned non-zero exit status 1.

I have no sudo right in my cluster, thus, I downloaded my CUDA in the micromamba venv.

It turns out to be ok to find nvcc, however, CUDA can’t be linked.

For instance, you can’t just set your_mamba_venv/ or your_conda_venv/ as the CUDA_HOME, because in this folder, the lib/ and include/ does not have and cuda -related files in it, only bin has nvcc, which makes it weird, but it is true.

This is because it has been actually downloaded in /your_env/targets/x86_64-linux, and the bin inside this folder also contains nvcc, include and lib in this folder truly make it to be CUDA_HOME.

In the venv, however, you can’t also compile, you have to change the CMakeLists.txt:

set(CUDA_TOOLKIT_ROOT_DIR "your_venv/targets/x86_64-linux")
set(CUDA_INCLUDE_DIRS "${CUDA_TOOLKIT_ROOT_DIR}/include")