The same parameters, different results for multiple trainings for cross-entropy loss

I noticed that multiclass classification on the same dataset gives different results for each training of bidirectional LSTM. I am mostly concerned for weighted avg f1 and weighted avg recall in the classification report. I just re-run the same Jupyter notebook and get a different result. For example, the first time I get a weighted avg f1 of 0.17 and the next training gives 0.26. I use the classic

criterion = nn.CrossEntropyLoss()

Also, I use


and the

train_iterator, valid_iterator = BucketIterator.splits(
    (train_ds, val_ds), 
    batch_sizes         =  (train_batch_size, valid_batch_size),
    sort_key            =   lambda x:len(x.text),
    sort                =   False, 
    sort_within_batch   =   True, 
    device              =   device)

I would like to hear your opinion.

Thank you

Setting the seed alone might not be sufficient to get deterministic results and you would have to disable all non-deterministic algorithms as described in the Reproducibility docs.