Word Embedding using transformers (BERT, Roberta, or any other model) or Fast-text, Glove etc?

I am training a model to learn title similarities between two titles. I have around 2-3 Million lines of text data, scraped from multiple sources. Now I want to use that data to train a model that can learn title similarity. For example Finance officer is close to the Finance lead compared to the sales officer.

Which models should I go to? Fast text, Glove, or transformer-based models.
Note: at inference time I will give titles only and compute some similarity metrics like cosine distance or something else.

The sentence transformers library has many pretrained models, and is implemented in pytorch. It might suit your needs: