The typical thing (GPT, vision transformer, …) is to make it a learned parameter, e.g. here:
Best regards
Thomas