RuntimeError: No CUDA GPUs are available

Hello,
I am using a recent project called vLLM for running llm inference on an A100 GPU.
For certain cases (like bfloat16 precision) when it tries to run torch.cuda.get_device_capability() (see code link https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L345 ), to check the device capability, I get the error below .

prop = get_device_properties(device)
File “site-packages/torch/cuda/init.py”, line 395, in
_lazy_init() # will define _get_device_properties
File “site-packages/torch/cuda/init.py”, line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

Now the interesting part - When I run vLLM standalone I don’t get this error. With the same configs when I run it in my application (python server), I get this error.

Any. thoughts on what specific scenario or env vars might trigger this ? I have checked most standard stuff.

What does “standalone” and “Python server” mean in this context? Are you using different virtual environments? If so, make sure both have a proper PyTorch binary with CUDA dependencies installed.

Yea, by standalone I mean, when I run vllm on its own, it brings up a gunicorn server, that we can hit for inference. Where as python server is a separate server, and we integrate vllm as part of that. Now both of these setup run with the same PYTHONPATH and environment variables.

Can you let me know any specific reason why this might show up in the case I mentioned ? I know there’s no actual problem with cuda setup as it works in the direct vllm path. So I just need to understand what triggers this in the failure path.