I’m trying to evaluate some models based on their accuracy (% of correctly guessed input examples) and their f1 scores. I’m fairly sure that I’m computing the two in the correct way but I was surprised to see that they are return the exact same score. The code I’m using is below:
epoch_loss = 0 epoch_accuracy = 0 epoch_f_one = 0 for i_batch, batch in enumerate(data_loader): visual_samples = batch['visual_sample'].float().to(device) audio_samples = batch['audio_sample'].float().to(device) labels = batch['label'].long().to(device) - 1 optimizer.zero_grad() outputs = net(visual_samples, audio_samples).float() loss = criterion(outputs, labels) loss.backward() optimizer.step() epoch_loss += loss.item() batch_accuracy = np.count_nonzero(torch.argmax(outputs, axis=1) == labels) / visual_samples.size(0) epoch_accuracy += batch_accuracy y_true = labels y_pred = torch.argmax(outputs, axis=1) f1 = f1_score(y_true=y_true, y_pred=y_pred, average='micro') epoch_f_one += f1 epoch_loss = epoch_loss / (i_batch + 1) epoch_accuracy = epoch_accuracy / (i_batch + 1) epoch_f_one = epoch_f_one / (i_batch + 1) max_batch = i_batch return epoch_loss, epoch_accuracy, epoch_f_one, max_batch
Am I calculating something wrong here or is it just that computing the per epoch accuracy turns out to the be the same as the F1 since they are both means of correct guesses over the epochs?