Using the TransformerEncoder with padded sequences

Haziq · February 4, 2022, 2:39pm

I am trying to use the TransformerEncoder with padded sequences where src is the input sequence of batch 2 and padded length 8.

encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8, batch_first=True)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
src = torch.rand(2, 8, 512) # [batch, padded sequence length, d_model]

I understand that the src_key_padding_mask variable is used to mark the padded timesteps to be ignored in the input where the ones mark the timesteps that should be ignored in the input.

src_key_padding_mask = torch.tensor([[0,0,0,0,1,1,1,1],[0,0,0,0,0,0,1,1]])

My question is, do I also have to specify the mask variable to mask out the vectors that were generated by the masked input? Or is it automatically inferred based on src_key_padding_mask?

out = encoder_layer(src, src_key_padding_mask=src_key_padding_mask, mask=?)