Hello,
I am using a recent project called vLLM for running llm inference on an A100 GPU.
For certain cases (like bfloat16 precision) when it tries to run torch.cuda.get_device_capability() (see code link https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L345 ), to check the device capability, I get the error below .
prop = get_device_properties(device)
File “site-packages/torch/cuda/init.py”, line 395, in
_lazy_init() # will define _get_device_properties
File “site-packages/torch/cuda/init.py”, line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Now the interesting part - When I run vLLM standalone I don’t get this error. With the same configs when I run it in my application (python server), I get this error.
Any. thoughts on what specific scenario or env vars might trigger this ? I have checked most standard stuff.