Dimensions of attention mask

nleroy917 · June 3, 2024, 10:44pm

yeah. that was it. I appreciate it. I saw that but it just didn’t seem like the answer. I suppose I should have just tried it. Doesn’t seem too well documented. From another thread

The main difference is that ‘src_key_padding_mask’ looks at masks applied to entire tokens. So for example, when you set a value in the mask Tensor to ‘True’, you are essentially saying that the token is a ‘pad token’ and should not be attended by any other tokens.