How to use model for inference (biomed NER BERT Tagger)

Yes, I trained my model to be a binary classifier, so either an entity is a gene/protein or not because I simply want to calculate the medical word / word ratio to determine whether a document can be classified as medical or not. I uploaded my whole code with a high level explanation of every step here: https://github.com/marcelbra/DocTagger

You can easily use a different data set or use the same one but need to change the labels the model uses if you want to use GENETAG and differentiate between gene and protein.

run_ner and utils_ner are from transformers repo, with this u can train a model.
in doc_builder the actual tagging of a document happens (I’m working on CORD-19)
pred_ner is the actual prediction, I may have modified the script I posted earlier!

Hope it helps :slight_smile: