How to load Dataset using Scikit Learn in Pytorch?

Hey, I am using Scikit Learn to print ROC Curve of my datasets. Well, I have done my Classification using ResNet in PyTorch in Google Colab. Here I am doing this with an example, here is it -
https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

There they use this to load Iris Datasets. -

> #Import some data to play with 
> iris = datasets.load_iris() 
> X = iris.data 
> y = iris.target

I am not getting that how to import my datasets and results in this to plot a ROC? Can you please help me to do that?

I am attaching my google colab file to let you guys know my approach - https://drive.google.com/open?id=1f8eTqjDiPBt8AYtQMiQqXTU76PU9fKzC

Thanks.

You just need the class membership probabilities for ROC curves. E.g., if you collect the probabilities, you can do this:

import matplotlib.pyplot as plt
import numpy as np


def plot_roc_curve(y_true, y_score, pos_label=1, num_thresholds=100):

    y_true_ary = np.array(y_true)
    y_score_ary = np.array(y_score)
    x_axis_values = []
    y_axis_values = []
    thresholds = np.linspace(0., 1., num_thresholds)

    num_positives = np.sum(y_true == pos_label)
    num_negatives = y_true.shape[0] - num_positives

    for i, thr in enumerate(thresholds):
        
        binarized_scores = np.where(y_score >= thr, pos_label, int(not pos_label))
        
        positive_predictions = (binarized_scores == pos_label)
        num_true_positives = (y_true[positive_predictions] == pos_label).sum()
        num_false_positives = (y_true[positive_predictions] != pos_label).sum()
        
        x_axis_values.append(num_false_positives / float(num_negatives))
        y_axis_values.append(num_true_positives / float(num_positives))

    plt.step(x_axis_values, y_axis_values, where='post')
    
    plt.xlim([0., 1.01])
    plt.ylim([0., 1.01])
    plt.ylabel('True Positive Rate')
    plt.xlabel('False Positive Rate')
    
    return None
plot_roc_curve(y_test, y_probabilities[:, 1], pos_label=1)
plt.show()

(I actually used that as a HW exercise last semester :))

Unknown

1 Like

Hey, I found a way to print probabilities, that is

output = model.forward(images)
prob = F.softmax(output, dim=1)
classes = Variable(torch.LongTensor(10, 1).random_(0, 3))
class_prob = torch.gather(prob, 1, classes)

I have certain concern that do I have to chance my classes?
I have also updated my post and added my actual file where I am working, to get this curve. Thanks, Sebastian.

prob = F.softmax(output, dim=1)

Just want to mention that ROC is a binary metric. So if you have multiple classes, you have to select the one you care about in case you have more than 2 possible class labels (i.e., more than 2 columns there)

I have four classes, so can’t I take four classes alongside? Do I need to take just one class and print it?

Right, you can’t. ROC is a binary metric so the interpretation in your case would be “given class versus rest” for each class. There different workarounds for dealing with multiclass problems when using ROC. One is to make a ROC curve for each class versus rest and/or average the curves.

1 Like