I’m trying to evaluate some models based on their accuracy (% of correctly guessed input examples) and their f1 scores. I’m fairly sure that I’m computing the two in the correct way but I was surprised to see that they are return the exact same score. The code I’m using is below:
epoch_loss = 0
epoch_accuracy = 0
epoch_f_one = 0
for i_batch, batch in enumerate(data_loader):
visual_samples = batch['visual_sample'].float().to(device)
audio_samples = batch['audio_sample'].float().to(device)
labels = batch['label'].long().to(device) - 1
optimizer.zero_grad()
outputs = net(visual_samples, audio_samples).float()
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
batch_accuracy = np.count_nonzero(torch.argmax(outputs, axis=1) == labels) / visual_samples.size(0)
epoch_accuracy += batch_accuracy
y_true = labels
y_pred = torch.argmax(outputs, axis=1)
f1 = f1_score(y_true=y_true, y_pred=y_pred, average='micro')
epoch_f_one += f1
epoch_loss = epoch_loss / (i_batch + 1)
epoch_accuracy = epoch_accuracy / (i_batch + 1)
epoch_f_one = epoch_f_one / (i_batch + 1)
max_batch = i_batch
return epoch_loss, epoch_accuracy, epoch_f_one, max_batch
Am I calculating something wrong here or is it just that computing the per epoch accuracy turns out to the be the same as the F1 since they are both means of correct guesses over the epochs?
Many thanks.