I can run this code successfully on default gloo backend, and it will fail when I shift to nccl backend.
Note that I have a higher cuda version than pytorch built, however I expect it could work without problem as per Install pytorch with Cuda 12.1 1.
I have tried to use python 3.8, however the problem persists.
Below is my environments:
PyTorch version: 2.0.0+cu117
CUDA used to build PyTorch: 11.7
OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Libc version: glibc-2.35
Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.0-60-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.0.140
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB
Nvidia driver version: 525.60.13
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True