F1 weighted score about BERT model in pytorch

I have created a function for evaluation a function. It takes as an input the model and validation data loader and return the validation accuracy, validation loss and f1_weighted score.

def evaluate(model, val_dataloader):
    """
    After the completion of each training epoch, measure the model's performance
    on our validation set.
    """
    # Put the model into the evaluation mode. The dropout layers are disabled during
    # the test time.
    model.eval()

    # Tracking variables
    val_accuracy = []
    val_loss = []
    f1_weighted = []

    # For each batch in our validation set...
    for batch in val_dataloader:
        # Load batch to GPU
        b_input_ids, b_attn_mask, b_labels = tuple(t.to(device) for t in batch)

        # Compute logits
        with torch.no_grad():
            logits = model(b_input_ids, b_attn_mask)

        # Compute loss
        loss = loss_fn(logits, b_labels)
        val_loss.append(loss.item())

        # Get the predictions
        preds = torch.argmax(logits, dim=1).flatten()

        # Calculate the accuracy rate
        accuracy = (preds == b_labels).cpu().numpy().mean() * 100
        val_accuracy.append(accuracy)

        # Calculate the f1 weighted score
        f1_metric = F1Score('weighted') 
        f1_weighted = f1_metric(preds, b_labels)

    # Compute the average accuracy and loss over the validation set.
    val_loss = np.mean(val_loss)
    val_accuracy = np.mean(val_accuracy)
    f1_weighted = np.mean(f1_weighted)

    return val_loss, val_accuracy, f1_weighted 

The function works well without f1 score, but when it’s inside of the function there is the following error

TypeError                                 Traceback (most recent call last)
<ipython-input-48-0e0f6d227c4f> in <module>()
      1 set_seed(42)    # Set seed for reproducibility
      2 bert_classifier, optimizer, scheduler = initialize_model(epochs=4)
----> 3 train(bert_classifier, train_dataloader, val_dataloader, epochs=4, evaluation=True)
      4 
      5 #1. 77.28

2 frames
<__array_function__ internals> in mean(*args, **kwargs)

/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py in mean(a, axis, dtype, out, keepdims)
   3368             pass
   3369         else:
-> 3370             return mean(axis=axis, dtype=dtype, out=out, **kwargs)
   3371 
   3372     return _methods._mean(a, axis=axis, dtype=dtype,

TypeError: mean() received an invalid combination of arguments - got (out=NoneType, axis=NoneType, dtype=NoneType, ), but expected one of:
 * (*, torch.dtype dtype)
 * (tuple of ints dim, bool keepdim, *, torch.dtype dtype)
 * (tuple of names dim, bool keepdim, *, torch.dtype dtype)

The core for f1 score can be found here