I have a multi-label problem where I need to calculate the F1 Metric, currently using SKLearn Metrics f1_score with samples as average.
Is it correct that I need to add the f1 score for each batch and then divide by the length of the dataset to get the correct value. Currently I am getting a 40% f1 accuracy which seems too high considering my uneven dataset.
My data is multi-label an example target would be [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1]
I am using BCEWithLogitsLoss and am using a sigmoid on the output with a threshold to get the comparable prediction (predicted) in code below.
Code Example
for epoch in range(500):
running_f = 0
for batch in enumerate(custom_train_loader):
# ...training / loss etc
running_f += f1_score(labels.cpu().int().numpy(), predicted.cpu().int().numpy(), average='samples') * batch_size
epoch_f1 = running_f1 / len(val_dataset)
You could append the current labels and predicted arrays to separate lists, create arrays from them, and calculate the F1 score after the epoch is done.