nn.TransformerEncoderLayer produces exactly the same output with same src,no matter what src_key_padding_mask or src_mask is.
Likewise,nn.TransformerDecoderLayer output is not affected by any one of tgt_mask,memory_mask,tgt_key_padding_mask or memory_key_padding_mask.
Does anyone know what’s going wrong?How could I make the masks work in the right way?Thanks a lot.
[Input0]:
import torch
import torch.nn as nn
encoder_layer=nn.TransformerEncoderLayer(d_model=6,nhead=2)
encoder_layer.eval()
src=torch.ones((4,3,6))
encoder_layer(src)[Output0]:
tensor([[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]], [[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]], [[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]]], grad_fn=<NativeLayerNormBackward0>)
print(encoder_layer(src,src_mask=torch.zeros((4,4)).bool()))
print(encoder_layer(src,src_mask=torch.tensor(
[[0,1,1,1],
[0,0,1,1],
[0,0,0,1],
[0,0,0,0]]
).bool()))
print(encoder_layer(src,src_mask=torch.tensor(
[[0,1,0,1],
[1,0,1,1],
[0,1,0,1],
[0,1,1,1]]
).bool()))[Output2]:
tensor([[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]], [[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]], [[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]]], grad_fn=<NativeLayerNormBackward0>)
tensor([[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]], [[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]], [[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]]], grad_fn=<NativeLayerNormBackward0>)
tensor([[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910],
[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]],[[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]], [[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]], [[ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910], [ 0.9927, 0.2251, -1.5199, 1.2508, -1.0397, 0.0910]]]
All above produces the same result!What’s going wrong?