Square_subsequent_mask BoolTensor alternative

gautierdag · December 3, 2020, 11:16am

Hi,
I’ve been implementing a transformer model but came across the function generate_square_subsequent_mask bool in both the PyTorch library and the Sequence-to-Sequence tutorial.

The current implementation generates a square mask matrix as follows:

def generate_square_subsequent_mask(self, sz: int) -> Tensor:
        """Generate a square mask for the sequence. The masked positions are filled with float('-inf').
            Unmasked positions are filled with float(0.0).
        """
        mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)
        mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
        return mask

For sz=2 this would give a matrix = [[0.0, -inf], [0.0, 0.0]]

I understand what it does but would like to understand if there was any reason not to do the following instead:

def generate_square_subsequent_mask(self, sz: int) -> Tensor:
        """Generate a square mask for the sequence. The masked positions are filled with True.
            Unmasked positions are filled with False.
        """
        mask = (torch.triu(torch.ones(sz, sz)) == 0).transpose(0, 1)
        return mask

For sz=2 this would give a matrix = [[False, True], [False, False]]

According to the nn.Transformer documentation, this should be equivalent since If a BoolTensor is provided, positions with True are not allowed to attend while False values will be unchanged.

I’ve confirmed it on the tutorial and obtain the same results with both.

Using the BoolTensor is simpler and would be slightly faster at the mask creation, is there an underlying reason to not recommend it or to prefer the float mask over the other?

Thanks!