Flex attention and SDPA output natively equivalent?

soulitzer December 30, 2024, 12:04am 2

I’d expect you to be able to use regular batched inputs. Maybe the examples here would be helpful attention-gym/examples/flex_attn.ipynb at main · pytorch-labs/attention-gym · GitHub