I have created a function for evaluation a function. It takes as an input the model and validation data loader and return the validation accuracy, validation loss and f1_weighted score.
def evaluate(model, val_dataloader):
"""
After the completion of each training epoch, measure the model's performance
on our validation set.
"""
# Put the model into the evaluation mode. The dropout layers are disabled during
# the test time.
model.eval()
# Tracking variables
val_accuracy = []
val_loss = []
f1_weighted = []
# For each batch in our validation set...
for batch in val_dataloader:
# Load batch to GPU
b_input_ids, b_attn_mask, b_labels = tuple(t.to(device) for t in batch)
# Compute logits
with torch.no_grad():
logits = model(b_input_ids, b_attn_mask)
# Compute loss
loss = loss_fn(logits, b_labels)
val_loss.append(loss.item())
# Get the predictions
preds = torch.argmax(logits, dim=1).flatten()
# Calculate the accuracy rate
accuracy = (preds == b_labels).cpu().numpy().mean() * 100
val_accuracy.append(accuracy)
# Calculate the f1 weighted score
f1_metric = F1Score('weighted')
f1_weighted = f1_metric(preds, b_labels)
# Compute the average accuracy and loss over the validation set.
val_loss = np.mean(val_loss)
val_accuracy = np.mean(val_accuracy)
f1_weighted = np.mean(f1_weighted)
return val_loss, val_accuracy, f1_weighted
The function works well without f1 score, but when it’s inside of the function there is the following error
TypeError Traceback (most recent call last)
<ipython-input-48-0e0f6d227c4f> in <module>()
1 set_seed(42) # Set seed for reproducibility
2 bert_classifier, optimizer, scheduler = initialize_model(epochs=4)
----> 3 train(bert_classifier, train_dataloader, val_dataloader, epochs=4, evaluation=True)
4
5 #1. 77.28
2 frames
<__array_function__ internals> in mean(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/numpy/core/fromnumeric.py in mean(a, axis, dtype, out, keepdims)
3368 pass
3369 else:
-> 3370 return mean(axis=axis, dtype=dtype, out=out, **kwargs)
3371
3372 return _methods._mean(a, axis=axis, dtype=dtype,
TypeError: mean() received an invalid combination of arguments - got (out=NoneType, axis=NoneType, dtype=NoneType, ), but expected one of:
* (*, torch.dtype dtype)
* (tuple of ints dim, bool keepdim, *, torch.dtype dtype)
* (tuple of names dim, bool keepdim, *, torch.dtype dtype)
The core for f1 score can be found here