I noticed that multiclass classification on the same dataset gives different results for each training of bidirectional LSTM. I am mostly concerned for weighted avg f1 and weighted avg recall in the classification report. I just re-run the same Jupyter notebook and get a different result. For example, the first time I get a weighted avg f1 of 0.17 and the next training gives 0.26. I use the classic
criterion = nn.CrossEntropyLoss()
Also, I use
train_iterator, valid_iterator = BucketIterator.splits( (train_ds, val_ds), batch_sizes = (train_batch_size, valid_batch_size), sort_key = lambda x:len(x.text), sort = False, sort_within_batch = True, device = device)
I would like to hear your opinion.