FlexAttention customizability

Hello. I am trying to implement the following 2 features and I am curious if it is possible with FlexAttention.

  1. I would like to use a different value for softmax numerical stabilization. Currently, softmax uses exp(value - max_value) for numerical stability. However, this is inefficient because we have to do two memory accesses: once to find the max value and once to get the sum. If the max_value can be set to a constant, this would make the implementation simpler. Also, combined with soft clipping with tanh, there would also be no overflow.
  2. Would it be possible to implement Softmax1 from “Attention is off by 1”? I am aware it does not have much effect in practice, but I dislike using an incorrect equation.