Hi,
when using nn.TransformerEncoderLayer()
with d_model = 2
, I got unexpected (at least to me) results, something I do not have if d_model != 2
, e.g.
encoder_layer = nn.TransformerEncoderLayer(d_model=2, nhead=1, batch_first=True)
src = torch.rand(2, 4, 2)
out = encoder_layer(src)
gives me
tensor([[[ 1.0000, -1.0000],
[ 1.0000, -1.0000],
[ 1.0000, -1.0000],
[ 1.0000, -1.0000]],[[ 1.0000, -1.0000], [-1.0000, 1.0000], [ 1.0000, -1.0000], [-1.0000, 1.0000]]], grad_fn=<NativeLayerNormBackward0>)
All values are stucked to either -1 or +1, something I do not observe when d_model != 2
.
Could someone explain it?
Thanks