Flex_attention returning logits

pikantrop · December 4, 2024, 11:23pm

I am running some experiments where I need to use attention logits (attention tensor pre-softmax). One example would be CoPE ([2405.18719] Contextual Position Encoding: Learning to Count What's Important)

Any chance that we can get an option to skip softmax?