UserWarning in torch.nn.MultiheadAttention

Hey! I am using torch==‘2.0.1+cu117’ on Linux and I am receiving a warning that claims that converting mask without torch.bool dtype to bool will negatively affect performance. It suggests to use a boolean mask directly…

I am calling the module like this:

self._attention_1(key=x, value=x, query=x, key_padding_mask=padding_mask) and ti have tested the padding mask to be a torch.tensor of type torch.bool or torch.float32.

I guess the warning is caused inside that function but i do not know if could be achieving more performance.