RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

Hi All,
I am trying to run DINO on multiple nodes with facebookincubator/submitit repo. We have a slurm server and I am able to train DINO on the slurm server using a single node (8gpus) [WITHOUT USING submitit] but when I try to run with multiple nodes, I am getting the below error:

submitit ERROR (2021-07-30 01:10:30,581) - Submitted job triggered an exception
Traceback (most recent call last):
File “/home/user/skanaconda3/envs/url/lib/python3.8/”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/home/user/skanaconda3/envs/url/lib/python3.8/”, line 87, in _run_code
exec(code, run_globals)
File “/home/user/skanaconda3/envs/url/lib/python3.8/site-packages/submitit/core/”, line 11, in
File “/home/user/skanaconda3/envs/url/lib/python3.8/site-packages/submitit/core/”, line 71, in submitit_main
File “/home/user/skanaconda3/envs/url/lib/python3.8/site-packages/submitit/core/”, line 64, in process_job
raise error
File “/home/user/skanaconda3/envs/url/lib/python3.8/site-packages/submitit/core/”, line 53, in process_job
result = delayed.result()
File “/home/user/skanaconda3/envs/url/lib/python3.8/site-packages/submitit/core/”, line 128, in result
self._result = self.function(*self.args, **self.kwargs)
File “”, line 67, in call
File “/home/user/code/dino/”, line 143, in train_dino
File “/home/user/code/dino/”, line 468, in init_distributed_mode
File “/home/user/skanaconda3/envs/url/lib/python3.8/site-packages/torch/distributed/”, line 439, in init_process_group
_default_pg = _new_process_group_helper(
File “/home/user/skanaconda3/envs/url/lib/python3.8/site-packages/torch/distributed/”, line 528, in _new_process_group_helper
pg = ProcessGroupNCCL(
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

From logs, I see that the job initially gets assigned to two nodes [with 8 gpus in each node] and then stops with the above error. I think the code crashes at this line . Why does at::cuda::getNumGPUs() returns 0 when there are gpus available?

Thanks in advance!

I am not familiar with submitit so I am unsure of how to validate the number of GPUs that is using. Before init_process_group can you also try printing the value of torch.cuda.device_count()? (torch.cuda.device_count — PyTorch master documentation). This may help to narrow down why there are any GPUs detected and whether this is an issue in the distributed package.