Is it possible to use an existing mask to create the BlockMask? Our masks are quite complex so it’s a bother to have to recreate a mask_mod() function when we already have a function that creates a boolean mask tensor (I know the outputs are not the same but still).
Looking into the code of torch/nn/attention/flex_attention.py I see a _create_block_mask_inner() which calls _convert_mask_to_block_mask(), which seems perfect.
But I see later _create_sparse_block_from_block_mask() is called with mask_mod as an argument. However it’s not clear to me whether mask_mod is actually used here.
So my Q is can one bypass having to create a mask_mod() function or not?