I am trying to use the TransformerEncoder with padded sequences where src
is the input sequence of batch 2 and padded length 8.
encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8, batch_first=True)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
src = torch.rand(2, 8, 512) # [batch, padded sequence length, d_model]
I understand that the src_key_padding_mask
variable is used to mark the padded timesteps to be ignored in the input where the ones
mark the timesteps that should be ignored in the input.
src_key_padding_mask = torch.tensor([[0,0,0,0,1,1,1,1],[0,0,0,0,0,0,1,1]])
My question is, do I also have to specify the mask
variable to mask out the vectors that were generated by the masked input? Or is it automatically inferred based on src_key_padding_mask
?
out = encoder_layer(src, src_key_padding_mask=src_key_padding_mask, mask=?)