What are the differences between Flash Attention and Memory Efficient Attention?

Puellaquae · July 6, 2023, 7:37am

I’m learning about PyTorch and Transformer. While reading the source code of PyTorch, I noticed that if I don’t enable the USE_FLASH_ATTENTION compilation condition, the memory efficient attention won’t be compiled into PyTorch. Does this mean that the implementation of memory-efficient attention depends on the implementation of flash attention? And, I am confused about the specific differences between memory efficient attention and flash attention in the implementation.