vLLM + openai/gpt-oss-20b on 3× RTX 3090 (CUDA 12.8) — FlashAttention Error

robin_ali · August 13, 2025, 6:08pm

Hi,

Trying to run openai/gpt-oss-20b on vLLM with:

I want to split the model across all GPUs, but I’m getting a FlashAttention error (likely needs FA3).

Any tips for multi-GPU vLLM setup and installing FlashAttention 3 for this environment?

Thanks!