TransformerEncoderLayer produces all nan tensor when src_key_padding_mask is an all-true tensor

The issue is as simple as explained in the title: I do not understand why the TransformerEncoderLayer module produces a tensor of nan values when provided with a full true mask.

Minimal working examples

WITH MASK:

import torch.nn as nn
import torch

model = nn.TransformerEncoderLayer(d_model = 1, nhead = 1)
tensor = torch.tensor([i for i in range(5)], dtype = torch.float32).unsqueeze(1)
mask = torch.tensor([True for _ in range(5)], dtype = torch.bool)
model(tensor, src_key_padding_mask = mask)

>>> tensor([[nan],
            [nan],
            [nan],
            [nan],
            [nan]], grad_fn=<NativeLayerNormBackward0>)

WITHOUT MASK:

import torch.nn as nn
import torch

model = nn.TransformerEncoderLayer(d_model = 1, nhead = 1)
tensor = torch.tensor([i for i in range(5)], dtype = torch.float32).unsqueeze(1)
model(tensor)

>>> tensor([[0.],
            [0.],
            [0.],
            [0.],
            [0.]], grad_fn=<NativeLayerNormBackward0>)