Hello Everyone, When I use deepspeed and I got the error below:
And My machine is 5090, cuda 12.2 torch2.6.0+cu126
How can I fix this error?
Hello Everyone, When I use deepspeed and I got the error below:
Can you run this script so we get a full view of your environment setup? pytorch/torch/utils/collect_env.py at main · pytorch/pytorch · GitHub
This error likely suggests some CUDA and NCCL incompatibility, also if you’re using cuda12.2 but torch2.6.0+cu126, then torch might be using some features that are part of cuda 12.6. You should use the version with cuda 11.8 compat (cu118)
Blackwell GPUs require CUDA 12.8 so install the latest stable (2.7.0) or nightly binary and it should work.