I’ve been struggling with this for ages. NVIDIA only supports CUDA 12 for Fedora 36 and beyond. Fedora 35 is deprecated. CUDA 12 has been out since September. PyTorch supposedly has supported CUDA 12 since late December, but it’s mid-March and there’s still no wheel for it - half a year after it came out.
I’m struggling to compile it (gcc 12.2.1), but that’s just been a fail thusfar. Appeared to be failing due to a ton of warnings and PyTorch’s use of -Werror (I’m a big fan of the use of Werror in development… but in production?). I guessed that my gcc is too new and so downgraded it, but then I got even more errors. Now I’m trying again by trying to suppress Werror and seeing how that goes…
Any chance we’ll get a wheel at any point? Thanks!
(ED: Nope, compile failed… Fbgemm is still using Werror for some reason and complaining about possible uninitialized variables… sigh…)
(ED2: Well, I managed to shoehorn it into compiling by literally replacing my systemwide gcc/g++/cpp/c++ binaries with wrapper scripts that strip out all -Werror arguments. But now when I actually include torch/python.h, type_caster is broken:
/usr/local/lib64/python3.10/site-packages/torch/include/pybind11/detail/…/cast.h:42:120: error: expected template-name before ‘<’ token
42 | return caster.operator typename make_caster::template cast_op_type();
/usr/local/lib64/python3.10/site-packages/torch/include/pybind11/detail/…/cast.h:42:120: error: expected identifier before ‘<’ token
… I give up )
We are still in the process of discussing and merging the last few needed changes into the code base before starting with the CUDA 12.x bringup for the binaries.
If you need to use this CUDA version, you would indeed need to build PyTorch from source or just use the latest NGC containers.
As a workaround for now you could use the current binaries (stable or nightly) with CUDA 11.7 or 11.8 which already support all released GPU architectures.
Also note that your locally installed CUDA toolkit won’t be used when executing the binaries as they ship with their own dependencies unless you build a custom CUDA extension.
Okay, thanks for the update (I know that maintaining packages is a rather thankless job!)
My gamer box has CUDA 12.0 drivers. I installed CUDA 11.8 and it seemed to work, but should it have?
Yes, since the PyTorch binaries ship with their own CUDA dependencies and work with a properly installed driver. Your locally installed CUDA toolkit will be used if you build PyTorch from source or a custom CUDA extension.
You can also install the nightlies with 12.1 now.
Are there nightly docker images? I found Package pytorch-nightly · GitHub but there is no information about the used version of CUDA in those.
I’m not familiar with the nightly docker build process, but you might want to pull the container and check the used CUDA version via
Thanks, I will do that.
I am asking because he last pytorch docker 2.0.1-cuda11.7-cudnn8-runtime does not seem to support A100 and H100 GPUs (arch ‘sm_90’).
Yes, CUDA 11.7 does not support sm_90, but sm_80, and you would need CUDA >= 11.8.
Bumping this thread – is there a plan for a CUDA 12-compatible stable release imminently, or will it be just nightlies for the foreseeable future?
The next stable
2.1.0 release will use 11.8 and 12.1.
Thanks, Piotr! Is there a planned release date for 2.1.0, or is that still TBD?
(Context is I need to build an H100-compatible image soon, and I’m figuring out whether to go to 11.8 now or if I can wait a few weeks and skip to 12.x – while sticking with wheel installation.)
2.1.0 is supposed to be released ~Oct. 2023, but note that the nightly binaries already support CUDA 12.1 and ship with it, so you could try these out in case nightly works for you.