Why BatchNorm1d result different from LayerNorm?

david-leon · April 23, 2020, 12:00am

Given a 2-dimensional input tensor x, say x.shape = (B, D), shouldn’t we expect BatchNorm1d(D) equal to LayerNorm(D) during training? Why their results are different?