Attention does not seem to be applied at TransformerEncoderLayer and MultiheadAttention PyTorch

Changing something at one position in my input does not affect the outputs at other positions of my transformer encoder. I made a test in isolation in PyTorch:

# My encoder layer
encoder_layer = nn.TransformerEncoderLayer(d_model=8, nhead=2)
# Turn off dropout
encoder_layer.eval()
# Random input
src = torch.rand(2, 10, 8)
# Predict the output
out_0 = encoder_layer(src)
# Change the values at one of the positions (position 3 in this case)
src[:,3,:] += 1
# Predict once again the output
out_1 = encoder_layer(src)
# Check at which positions the outcomes are different between the two cases
# I summed in the embedding space direction
print(np.sum(np.abs(out_0.detach().numpy()),axis=-1) - np.sum(np.abs(out_1.detach().numpy()),axis=-1))

Output:

[[ 0. 0.  0.  -0.15470695  0.   0. 0.  0.     0.  0.   ]
 [ 0.   0.  0.  -0.27988768  0.  0.0.   0.   0.   0.   ]]

However, this does work when I do this in TensorFlow
Output in Tensorflow:

[[6.4196725 6.775745  6.946576  7.26213   6.473065  5.520765  6.201167
  7.1266503 6.3147016 6.614853 ]
 [5.565378  7.030789  6.768366  6.6065626 6.7277775 7.480627  6.6785836
  6.4560523 6.4248576 6.6436586]]

I repost this from my Stack Overflow post. But this might be the better place for the question.

I already got my answer! I must add “batch_first:True”