Understanding "mask" dtype from TransformerEncoder forward

You might run into this limitation disallowing the fast path if floating point masks are used.

1 Like