Pytorch transformers implementation vs other repos

Is pytorch’s implementation of transformers state of the art or it is better to use a different implementation?

Well, anything you can find “oficially implemented” is not SOTA.
Think that by the time they port it to pytorch researchers relese new transformers.
A different thing is whether it is worth to spend time on a coarser code or not.
Pytorch transformers are straight-forward.