Transformer for classification

I want to use a transformer for classification. I want to submit a proposal at the entrance, and at the exit I want to classify it into two classes.
Do I have to use a decoder or can I get a softmax at the output of the encoder?

if I input 10 words, I get 10 vectors at the output of the encoder. Then you need to convert this vector to get 2 classes. How can I convert these vectors? Somewhere I read that you can average. Someone writes that you can use CNN, RNN. I don’t know how to do it right. How does BERT make a classification?