How do I make a multi-class classification using the BERT model?

I have a set of Russian-language text and several classes for text in the form:

Text Class 1 Class 2 Class N
text 1 0 1 0
text 2 1 0 1
text 3 0 1 1

I make a classifier like in this article, only I change the number of output neurons:

But BERT starts to work like a silly classifier, i.e. it always gives ones or zeros to some criterion.

I also tried using




tokenizer, but the result is the same.

What’s the problem? What am I doing wrong? Please help. I’m new to NLP, I’ve only done image processing before.