Scaled_dot_product_attention is not numerically stable

ElToto · June 21, 2023, 3:42pm

Thanks for the reply, good question

np.testing.assert_allclose uses as default value 1e-07 so I assumed, this is the accuracy we want to achieve if two calculations should be identical. But this might be wrong.
I was using this value to test if a custom transformer implementation is identically to a timm.VisionTransformer but now, timm switched to F.scaled_dot_product_attention so my test is failing.
I guess i can set the value to 1e-06.