I have created a function for evaluation a function. It takes as an input the model and validation data loader and return the validation accuracy, validation loss and f1_weighted score.
def evaluate(model, val_dataloader):
"""
After the completion of each training epoch, measure the model's performance
on our validation set.
"""
# Put the model into the evaluation mode. The dropout layers are disabled during
# the test time.
model.eval()
# Tracking variables
val_accuracy = []
val_loss = []
f1_weighted = []
# For each batch in our validation set...
for batch in val_dataloader:
# Load batch to GPU
b_input_ids, b_attn_mask, b_labels = tuple(t.to(device) for t in batch)
# Compute logits
with torch.no_grad():
logits = model(b_input_ids, b_attn_mask)
# Compute loss
loss = loss_fn(logits, b_labels)
val_loss.append(loss.item())
# Get the predictions
preds = torch.argmax(logits, dim=1).flatten()
# Calculate the accuracy rate
accuracy = (preds == b_labels).cpu().numpy().mean() * 100
val_accuracy.append(accuracy)
# Calculate the f1 weighted score
f1_metric = F1Score('weighted')
f1_weighted = f1_metric(preds, b_labels)
# Compute the average accuracy and loss over the validation set.
val_loss = np.mean(val_loss)
val_accuracy = np.mean(val_accuracy)
f1_weighted = np.mean(f1_weighted)
return val_loss, val_accuracy, f1_weighted
The core for f1 score can be found here
Before the evaluation function there is a function which trains a bert model and has the following inputs
train(model, train_dataloader, val_dataloader, epochs, evaluation).
Thus if the evaluation = True, then the validation accuracy seems in the end of each epoch.
As for the dataloaders are created with the following way:
# Convert other data types to torch.Tensor
train_labels = torch.tensor(authors_train)
# Create the DataLoader for our training set
train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
With a similar way you cal create the dataloader for validation and testing set.
Update:
I changed the line
f1_weighted = f1_metric(preds, b_labels)
with this one
f1_weighted.append(f1_metric(preds, b_labels))
and now I have the following error
AttributeError Traceback (most recent call last)
<ipython-input-49-0e0f6d227c4f> in <module>()
1 set_seed(42) # Set seed for reproducibility
2 bert_classifier, optimizer, scheduler = initialize_model(epochs=4)
----> 3 train(bert_classifier, train_dataloader, val_dataloader, epochs=4, evaluation=True)
4
5 #1. 77.28
3 frames
<__array_function__ internals> in mean(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
168 ret = arr.dtype.type(ret / rcount)
169 else:
--> 170 ret = ret.dtype.type(ret / rcount)
171 else:
172 ret = ret / rcount
AttributeError: 'torch.dtype' object has no attribute 'type'