Why FaceNet uses un-normalized feature embedding for training?

Follow the FaceNet code GitHub - bubbliiiing/facenet-pytorch: 这是一个facenet-pytorch的库,可以用于训练自己的人脸识别模型。. During training, we can find that it used unnormalized feature embedding, and added a classifier. However, during inference, it used normalized feature embedding.

My question is that, why not directly use the normalized feature embedding for training? What is the thoery behind using unormalized feature embedding for trainning, while normalized one for inference?

BTW, I understand normalization can put all features into the same scale, and will work better for inference. Please make sure to answer why not use normalized feature embedding for trainning?