ROC-AUC is high but PR-AUC value is very low


I am working on DNA sequences data and using CNN. My dataset is hugely imbalanced.
positive class samples (~500)
negative class samples (~150,000)

So I am using WeightedRandomSampler to oversample and balance classes before feeding to data loader.

I use a 5-fold cross-validation. When I did few test runs, I could get a decent ROC value but the PR-AUC value seems to be really low.

For fold 1:
roc auc 0.9667848699763594
precision auc 0.055329116326074484

For fold 2:
roc auc 0.8476321207961566
precision auc 0.03307627288669479

For fold 3:
roc auc 0.9528898540612085
precision auc 0.05020178518546394

I suspect that there are lot of false negatives. Since the positive class samples (~500) is very low compared to negative class samples (~150,000) the model learns the negative class better and predicts most of the test samples as negative.

I tried weighing the positive class using
weight = [50.0]
class_weight = torch.FloatTensor(weight).to(device)
criterion = nn.BCEWithLogitsLoss(pos_weight=class_weight)
By doing this, almost all samples are predicted as positive.

I tried Adaptive learning rates as well but the precision-recall values do not seem to improve.
Can someone guide me and let me know the ideas to improve Precision and Recall values?


1 Like

That’s generally a tough problem. :confused:
Could you post the confusion matrix, so that we get a feeling about the predictions?

I understand :slightly_frowning_face: :
Since the imbalance is too high, the model predicts most of the samples as negatives (class 0)
A sample of my confusion matrix:
[[1023 0]
[ 1 0]]

[[1022 0]
[ 2 0]]

[[1018 0]
[ 6 0]]

I would try to lower the impact of the imbalance a bit and subsample the negative samples before training to e.g. 15000 samples (or even lower), and then retry the WeightedRandomSampler.

Btw. could you post the code snippet you are using to create the class counts, weights, and WeightedRandomSampler?
I would like to make sure nothing goes wrong there, as your model still overfits badly on the negative class.

Yeah sure.

class_sample_count = np.array([len(np.where(Ytr == t)[0]) for t in np.unique(Ytr)])
weight = 1. / class_sample_count
samples_weight = np.array([weight[t] for t in Ytr])
samples_weight = torch.from_numpy(samples_weight)
sampler ='torch.DoubleTensor'), len(samples_weight))
train_loader =, batch_size=batch_size, num_workers=1, sampler=sampler)

Maybe I will try to under sample negative sets and try to see how it behaves.

The code looks alright. Let me know, how the experiments worked out.

Sure :slight_smile: Thank you for your response


I did the downsampling way too. The results are still the same :frowning:

Ok, thanks for the information.
Let’s maybe try to scale down the problem and have a minimal working version.
Could you post the model definition, so that we could use it as a starter?
If that’s not possible due to licensing etc., feel free to create a “similar” dummy model.

Yeah sure. Thanks for your help.
Here it is

class ClDataset(Dataset):
  def __init__(self, X, Y):
        self.len = len(X)
        temp = np.asarray(X, np.float64)
        self.x = torch.from_numpy(temp)
        self.y = torch.from_numpy(np.asarray(Y))

  def __getitem__(self, index):
        return self.x[index], self.y[index]

  def __len__(self):
        return len(self.y)

class ClCNN(nn.Module):

    def __init__(self):
        super(ClCNN, self).__init__()
        # convolutional layer
        self.layer11 = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=(4, 3), stride=1, padding=(2, 2)),
            nn.MaxPool2d(kernel_size=2, stride=2))

        self.layer12 = nn.Sequential(
            nn.Conv2d(1, 8, kernel_size=(4, 5), stride=1, padding=(2, 2)),
            nn.MaxPool2d(kernel_size=2, stride=2))

        self.layer21 = nn.Sequential(
            nn.Conv2d(8, 16, kernel_size=(3, 2), stride=1, padding=(2, 2)),
            nn.MaxPool2d(kernel_size=1, stride=1))

        self.layer22 = nn.Sequential(
            nn.Conv2d(8, 16, kernel_size=(3, 3), stride=1, padding=(2, 2)),
            nn.MaxPool2d(kernel_size=1, stride=1))

        self.fc1 = nn.Linear(6944, 512)
        self.fc2 = nn.Linear(512, 2)

    def forward(self, x):
        x = x.float()
        out11 = self.layer11(x)
        out12 = self.layer12(x)

        out21 = self.layer21(out11)
        out21 = out21.reshape(out21.size(0), -1)
        out22 = self.layer22(out12)
        out22 = out22.reshape(out22.size(0), -1)
        out =, out22), 1)
        out = self.fc1(out)
        out = F.dropout(out, p=0.5,
        out = F.log_softmax(self.fc2(out), dim=-1)

        return out

I am using SGD optimizer and CrossEntropyLoss with lr=0.001.

Maybe I should try to tune my hyperparameters. Is it possible for you to share some example of how to use Hyperopt or Hypersearch in Pytorch CNN??

Thanks for the code!
nn.CrossEntropyLoss applies F.log_softmax and nn.NLLLoss internally, so could you please remove the F.log_softmax from your model or use nn.NLLLoss as the criterion, and rerun the experiment?

Aah ok. I am just running it. Will update you in some time. Thank you for your suggestions so far :slight_smile:

Did 2 runs so far and the results are definitely better :slight_smile:

roc auc 0.8597332329593886
precision auc 0.1152646061940882

roc auc 0.8950204024201491
precision auc 0.29831894063835274

Thank you so much!!

Puh, I was running out of ideas and couldn’t believe that the model is overfitting that much even with weighted sampling. :wink:
Feel free to post updates on your experiments, as I’m always interested in these imbalanced cases. :slight_smile:

Sure :slight_smile:
I got this using under sampling. I will also try using WeightedRandomSampler and let you know :slight_smile: Thank you once again !!

Do you think using a hyperparameter optimization technique can improve the results? I am reading about different techniques available like hypersearch and hyperopt but since I am new to DL I am not able to get a grip of it. Is it posible for you to post any sample implementation of hypersearch or hyperopt for pytorch CNN?

These techniques might help, but I’m unfortunately inexperienced in this topic and just played around with some architecture search methods. :frowning: So let’s better wait for some experts and their opinion. :slight_smile:

Oh Okay sure :slight_smile:

Hey!! Happy Christmas :slight_smile: Hope your Christmas is going good!!

Incase you are working this week, I would just like to post a question. I managed to get the PR-AUC to say like around 23%. Do you have any ideas to increase it more?? Something like for example

  • making the network more complex or simple

  • or increasing the batch size etc.

1 Like