Padding mask when 0 is an actual value

A solution that I can think of is to change the embedding dictionary so that 0 represents the token. This is what is usually done in the literature.