Value of [CLS] Token for Transformer Encoders

The typical thing (GPT, vision transformer, …) is to make it a learned parameter, e.g. here:

Best regards

Thomas