Why does PyTorch's Transformer Encoder implementation have a norm argument?

Goldname · February 21, 2024, 6:07am

The TransformerEncoder has this argument: encoder_layer: an instance of the TransformerEncoderLayer() class (required).

But TransformerEncoderLayer already has LayerNorm built in. Why does TransformerEncoder take in another norm arg: norm: the layer normalization component (optional).