Issues calculating roc_score for binary classification

trying to calculate the roc_score per epoch. my idea is to combine all the batch predictions then calculate but i keep getting error. how do you approach this. i am predicting between 0 and 1

train_losses = []
valid_losses = []
for epoch in range(1, num_epochs + 1):
    y_true = []
    y_pred = []
    train_loss = 0.0
    valid_loss = 0.0
    for data, target in train_loader:
        data =
        target =
        output = model(data)
        target = target.unsqueeze(1).type_as(output)
        loss = criterion(output, target)
        train_loss += loss.item() * data.size(0)
    with torch.no_grad():
        for data, target in valid_loader:
            data =
            target =
            target = target.unsqueeze(1).type_as(output)
            output = model(data)
            loss = criterion(output, target)
            valid_loss += loss.item() * data.size(0)
    roc_score = roc_auc_score(y_true,y_pred)
    acc = (PREDS == TARGETS).mean() * 100.
    # calculate-average-losses
    train_loss = train_loss/len(train_loader.sampler)
    valid_loss = valid_loss/len(valid_loader.sampler)
    # print-training/validation-statistics
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f} \tValidation ROCSCORE: {:.6f}'.format(epoch, train_loss, valid_loss,roc_score))


ValueError                                Traceback (most recent call last)
<ipython-input-69-a1c7781e11ef> in <module>()
     29             y_true.append(
     30             y_pred.append(
---> 31     roc_score = roc_auc_score(y_true,y_pred)
     32     acc = (PREDS == TARGETS).mean() * 100.
     33     # calculate-average-losses

1 frames
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/ in roc_auc_score(y_true, y_score, average, sample_weight, max_fpr, multi_class, labels)
    367     y_type = type_of_target(y_true)
--> 368     y_true = check_array(y_true, ensure_2d=False, dtype=None)
    369     y_score = check_array(y_score, ensure_2d=False)

/usr/local/lib/python3.6/dist-packages/sklearn/utils/ in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    572         if not allow_nd and array.ndim >= 3:
    573             raise ValueError("Found array with dim %d. %s expected <= 2."
--> 574                              % (array.ndim, estimator_name))
    576         if force_all_finite:

ValueError: Found array with dim 3. Estimator expected <= 2.

I computed that manually in Chapter 14 or our book by sorting the predictions and then getting tp / fp per threshold with broadcasting (cell 4) and then doing the numerical integration using the trapezoidal rule (cell 5).
Lazy, but works well enough for me.

Best regards


@tom I checked the notebook .however, the problem is not computing the score I don’t know how to extract my batch predictions so I can calculate the ROC score

so output is between 0 and 1 and has batch x ??? shape?

I need a 2 dimensional array to compute the ROC score, when I forward the model per batch it gives me a 2 dimension array but I need to collect all my batch predictions so I used list.append() that is turning the predictions to 3d hence I can’t compute… Pls How do I store the batch predictions?

Storing them in a list and then doing pred_tensor =, dim=0) should do the right thing. I would personally use y_pred(output.detach().cpu()) and store a list of torch.Tensors, leaving the conversion to numpy array for later (or you might see if the array interface does its magic, with Matplotlib it often does).