Dear all,
I am working with TransformerEncoder module. I understood the “mask” parameter from the forward function can be a boolean tensor where True indicates that it is forbidden to attend the token and False is the inverse.
In my case, I am working with sequence-to-sequece mask. Initially, I was converting my mask to float with the following code
mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
However, I realized it takes lots of time (~0.60 seconds) when length of context is > 50. So I removed the float conversion and kept boolean values and it takes only ~0.06 seconds for equivalent context length.
Note that my masks shape are (batch_size, T, T) because each sample from the batch has have different mask.
My question is : what is the key difference between a boolean and float mask ? Do they give the same results ?
Thanks !