Transformer Mask Implementation

anirband · April 27, 2021, 5:22am

I was going over the example here on Transformer encoder for language modelling: Sequence-to-Sequence Modeling with nn.Transformer and TorchText — PyTorch Tutorials 1.8.1+cu102 documentation

The following function is used to generate mask:

def generate_square_subsequent_mask(self, sz):
mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)
mask = mask.float().masked_fill(mask == 0, float(’-inf’)).masked_fill(mask == 1, float(0.0))
return mask

Shouldn’t the second line be

mask = mask.float().masked_fill(mask == 0, float(1.0)).masked_fill(mask == 1, float(0.0))

instead of mask = mask.float().masked_fill(mask == 0, float(’-inf’)).masked_fill(mask == 1, float(0.0)) ?