I have a set of Russian-language text and several classes for text in the form:
Text | Class 1 | Class 2 | … | Class N |
---|---|---|---|---|
text 1 | 0 | 1 | … | 0 |
text 2 | 1 | 0 | … | 1 |
text 3 | 0 | 1 | … | 1 |
I make a classifier like in this article, only I change the number of output neurons:
But BERT starts to work like a silly classifier, i.e. it always gives ones or zeros to some criterion.
I also tried using
AutoModel.from_pretrained("DeepPavlov/rubert-base-cased-sentence")
with
AutoTokenizer.from_pretrained("DeepPavlov/rubert-base-cased-sentence")
tokenizer, but the result is the same.
What’s the problem? What am I doing wrong? Please help. I’m new to NLP, I’ve only done image processing before.