I am currently working on a multi-label binary classification problem.

The output of my model is a tensor like this:

`tensor([[3.0914e-08, 3.2459e-17]])`

and the ground truth label looks like this:

`tensor([[1, 1]])`

I iterate over a custom validation DataLoader (after training for one epoch) and for every input and label I execute:

```
prediction = self._model(x)
loss = self._crit(output, y.float())
return loss, prediction
```

Where `self._crit`

is BCELoss. Now I am trying to calculate the f1-score using `sklearn.metrics`

via:

`f1_score(label[0], (prediction[0] > 0.5).float() * 1)`

I donâ€™t know whether this is the correct approach but the f1_score always ends up being 0.

Instead of calculating this for every validation sample, should I maybe do this only for the whole validation set?

If so - how?