I’m trying to run PyTorch on a server with eight A100-PCIE-40GB GPUs,
but torch.cuda.is_available() returns False.
The server doesn’t have an NVSwitch, so I don’t believe the issue is related to nvidia-fabricmanager.
Running lspci | grep -i nvswitch
produces no output.
nvcc --version
shows CUDA 12.4, and nvidia-smi
appears normal.