Flash attention compilation warning?

Andrei_Fedorov · February 6, 2024, 7:34am

Hello folks… can anyone advise why after upgrade to Pytorch 2.2.0 ( using pip in win10, RTX A2000 GPU) I am getting the following warning:
AppData\Roaming\Python\Python311\site-packages\torch\nn\functional.py:5476: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at …\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)

My code wasnt change, I use same as was using with torch 2.1

Release notes claims FlashAttention-v2 support, but due some reason pytorch wheel is no compiled with it… why ? any way to fix it ?
In general after the warning all continue to work as usual, but the warning is annoiyng.

thanks

GLZhu · February 6, 2024, 7:42am

Add USE_FLASH_ATTENTION=1 in env

Andrei_Fedorov · February 6, 2024, 7:56am

unfortunately this didnt help… I set the env, but still have same warning
I thinks
this is not related to env of any other settings… the wheel was not compiled with FA support… so far it not supposed to run with any setting, isnt it ?

Meanwhile the code say that FA is available…

print(“Device name:”, torch.cuda.get_device_properties(‘cuda’).name)
print(“FlashAttention available:”, torch.backends.cuda.flash_sdp_enabled())
print(f’torch version: {torch.version}')

Device name: NVIDIA RTX A2000
FlashAttention available: True
torch version: 2.2.0+cu121

GLZhu · February 6, 2024, 8:11am

It seems that your wheel is just not compiled with flash attention.
If you own the environment, just do:
pip3 install --force-reinstall --pre torch torchtext torchvision torchaudio torchrec --extra-index-url https://download.pytorch.org/whl/nightly/cu121

Andrei_Fedorov · February 6, 2024, 8:46am

Have no idea what went wrong, but after such installation and my code start I am getting
AssertionError: Torch not compiled with CUDA enabled

rolling all back and wait stable release with complete FA support

thanks

GLZhu · February 6, 2024, 8:49am

You are already very close to the answer, try remove --pre in command above and install again

Andrei_Fedorov · February 6, 2024, 10:18am

thanks for you efforts… but unfortunately no progress
I have installed
while checking got
Device name: NVIDIA RTX A2000
FlashAttention available: True
torch version: 2.3.0.dev20240122+cu121

but then, when I start my code I still got same warning \AppData\Roaming\Python\Python311\site-packages\torch\nn\functional.py:5504: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at …\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:308.)

guess the night build still compiled with no FlashAttention support…

GLZhu · February 6, 2024, 10:45am

Try compile your own pytorch then, follow the compile from source guide in pytorch repo with env USE_FLASH_ATTENTION=1 should suffice.

Andrei_Fedorov · February 6, 2024, 11:25am

thanks… just I think I am not such a hero yet … no rush with FAv2. I’ll wait.

ptrblck · February 6, 2024, 3:08pm

FlashAttentionV2 is not available on Windows yet. See #108175.

Andrei_Fedorov · February 6, 2024, 3:52pm

but is this complilation warning related to FAv2 in particular ? or FA in general ? 2.1.2 has no this warning and seems that FA does work there.

ptrblck · February 6, 2024, 4:07pm

I assume it’s specific to FlashAttentionV2, but also am not using Windows and thus cannot verify the support of previous implementations.
Based on e.g. this comment it seems to be the case.