nn.TransformerEncoderLayer with d_model = 2

Hi,

when using nn.TransformerEncoderLayer() with d_model = 2, I got unexpected (at least to me) results, something I do not have if d_model != 2, e.g.

encoder_layer = nn.TransformerEncoderLayer(d_model=2, nhead=1, batch_first=True)
src = torch.rand(2, 4, 2)
out = encoder_layer(src)

gives me

tensor([[[ 1.0000, -1.0000],
[ 1.0000, -1.0000],
[ 1.0000, -1.0000],
[ 1.0000, -1.0000]],

    [[ 1.0000, -1.0000],
     [-1.0000,  1.0000],
     [ 1.0000, -1.0000],
     [-1.0000,  1.0000]]], grad_fn=<NativeLayerNormBackward0>)

All values are stucked to either -1 or +1, something I do not observe when d_model != 2.
Could someone explain it?
Thanks