In order to understand how exactly fill-mask pipeline works in transformers, I decided to track the computations. The bert embeddings call layer norm once on an embedding. However F.layer_norm in normalization is called several times right after the above call. I expected F.layer_norm to be called only once. Why does this happen?