How to check and read Confusion matrix?

Deb_Prakash_Chatterj · April 6, 2019, 5:15pm

This query seems a little odd because I am printing a multi-class Confusion Matrix and what I am getting is not completely understandable for me. I got the code for Confusion matrix from this helpful forum and I have changed a little bit. I have put the whole confusion matrix into a function and I have fetched the class number from my dataset.

batch_size = 32 # nb_samples

output = torch.randn(batch_size, n_classes) # refer to output after softmax
target = torch.randint(0, n_classes, (batch_size,)) # labels

def confusion_matrix(preds, labels):

    preds = torch.argmax(preds, 1)
    conf_matrix = torch.zeros(n_classes, n_classes)
    for p, t in zip(preds, labels):
        conf_matrix[p, t] += 1

    print(conf_matrix)
    TP = conf_matrix.diag()
    for c in range(n_classes):
        idx = torch.ones(n_classes).byte()
        idx[c] = 0
        TN = conf_matrix[idx.nonzero()[:,None], idx.nonzero()].sum()
        FP = conf_matrix[c, idx].sum()
        FN = conf_matrix[idx, c].sum()

        sensitivity = (TP[c] / (TP[c]+FN))
        specificity = (TN / (TN+FP))

        print('Class {}\nTP {}, TN {}, FP {}, FN {}'.format(
            c, TP[c], TN, FP, FN))
        print('Sensitivity = {}'.format(sensitivity))
        print('Specificity = {}'.format(specificity))
    
confusion_matrix(output, target)

Now first is this batch size = 32 is the same as this -
dataloader_train = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle=True, num_workers=2)?
Second, This is the output that I got from my code, I have two sets, called train(has around 1000 images each class) and val(has around 8 images in each class),

 tensor([[3., 2., 0., 3.],
        [1., 0., 1., 1.],
        [3., 2., 4., 2.],
        [5., 2., 1., 2.]])
Class 0
TP 3.0, TN 15.0, FP 5.0, FN 9.0
Sensitivity = 0.25
Specificity = 0.75
Class 1
TP 0.0, TN 23.0, FP 3.0, FN 6.0
Sensitivity = 0.0
Specificity = 0.8846153616905212
Class 2
TP 4.0, TN 19.0, FP 7.0, FN 2.0
Sensitivity = 0.6666666865348816
Specificity = 0.7307692170143127
Class 3
TP 2.0, TN 16.0, FP 8.0, FN 6.0
Sensitivity = 0.25
Specificity = 0.6666666865348816

I know that by this matrix, I can understand that my classifier classified 3 correct first class predictions, 0-second class predictions, and 4 third class predictions and 2 fourth class predictions.
I have studied some site about multiclass Confusion matrix, but I am not sure about the numbers in the confusion matrix and classwise TP, TN, FP, and FN. But I have 1000 images, then why these small numbers? Do I have to multiply 100 to the Sensitivity and Specificity values to get the percentage of those values, right? Thanks for the help, this confusion matrix is really confusing to me.

ptrblck · April 6, 2019, 5:22pm

If you would like to create the confusion matrix for your whole validation dataset, you should update it batch wise. Currently you have these small numbers, since you’ve only passed a single batch to create the conf mat.

The stats (TP, TN, FP and FN) are created based on this description:

Deb_Prakash_Chatterj · April 6, 2019, 5:44pm

Yeah, that’s what you told me, but is this batch size, same as that’s mentioned in the dataloader_train?
Am I just passing the dataloader_train in this matrix and not dataloader_test? Do I have to create another matrix for dataloader_test?

ptrblck · April 6, 2019, 8:14pm

The batch_size in my code example would correspond to the batch size you set in your DataLoader.
To update the conf mat you would have to pass and return it from the method:

def confusion_matrix(preds, labels, conf_matrix):
    preds = torch.argmax(preds, 1)
    for p, t in zip(preds, labels):
        conf_matrix[p, t] += 1
    ...
    return conf_matrix

conf_matrix = torch.zeros(n_classes, n_classes)
for data, target in test_loader:
    output = ...
    conf_matrix = confusion_matrix(output, target, conf_matrix)

Yes, you shouldn’t mix the training, validation and test statistics.

Deb_Prakash_Chatterj · April 8, 2019, 8:21pm

Sorry for late reply, but I think there is a little mistake in conf_matrix and conv_matrix? isn’t it?
We should return conf_matrix and call conf_matrix?
Also, I have a question that do I have to put the Confusion_matrix function after the epoch, images and target loop?
I am confused about this stage. Thanks.

ptrblck · April 8, 2019, 11:19pm

Yeah, there were some typos in my post. Sorry for that

You should call this method in each iteration, passing the output of your model and the current target batch.

Deb_Prakash_Chatterj · April 10, 2019, 10:57am

Well, thanks I have done that, but the result ends up in a very confusing manner, See it is my code -

for epoch in range(epochs):
  
  running_loss = 0
  model.train()
  for images, labels in dataloader_train:
    
    #steps += 1
    images, labels = images.to(device), labels.to(device)
    
    optimizer.zero_grad()
    
    output = model.forward(images)
    conf_matrix = confusion_matrix(output, labels, conf_matrix)
    p = torch.nn.functional.softmax(output, dim=1)
    prediction = torch.argmax(p, dim=1)
    #loss = torch.nn.functional.nll_loss(torch.log(p), y)
    loss = criterion(output, labels)
    loss.backward()
    optimizer.step()
    
    running_loss += loss.item()
    
  #if steps % print_every == 0:
  valid_loss = 0
  accuracy = 0
  model.eval()
  for images, labels in dataloader_test:
    optimizer.zero_grad()
    with torch.no_grad():
       
      images, labels = images.to(device), labels.to(device)

      output = model.forward(images)
      conf_matrix = confusion_matrix(output, labels, conf_matrix)
      p = torch.nn.functional.softmax(output, dim=1)
      prediction = torch.argmax(p, dim=1)
      loss = criterion(output, labels)
          
      valid_loss += loss.item()
          
      ps = torch.exp(output)
         
      top_p, top_class = ps.topk(1, dim = 1)
      equals = top_class == labels.view(*top_class.shape)
      accuracy += torch.mean(equals.type(torch.FloatTensor))
        
  print("Epoch: {}/{} " .format(epoch+1, epochs))
  print("Train loss: {:.4f}.. " .format(running_loss/len(dataloader_train)))
  print("Valid loss: {:.4f}.. " .format(valid_loss/len(dataloader_test)))
  print("Accuracy: {:.4f}.. " .format(accuracy/len(dataloader_test)))
  model.train()

So, the result in this case comes in almost 100 times repeting this(obviously with different values) per epoch -

tensor([[6639., 1154., 1566.,  632.],
        [ 744., 3288.,  726.,  758.],
        [1111.,  772., 3596.,  850.],
        [ 519., 1084., 1266., 4964.]])
Class 0
TP 6639.0, TN 17304.0, FP 3352.0, FN 2374.0
Sensitivity = 0.7366026639938354
Specificity = 0.8377227187156677
Class 1
TP 3288.0, TN 21143.0, FP 2228.0, FN 3010.0
Sensitivity = 0.5220705270767212
Specificity = 0.9046681523323059
Class 2
TP 3596.0, TN 19782.0, FP 2733.0, FN 3558.0
Sensitivity = 0.5026558637619019
Specificity = 0.8786142468452454
Class 3
TP 4964.0, TN 19596.0, FP 2869.0, FN 2240.0
Sensitivity = 0.6890616416931152
Specificity = 0.8722902536392212

But I want this result to be repeted only once, for each train and validation/test loop. So I have done this -

for images, labels in dataloader_train:
    
    #steps += 1
    images, labels = images.to(device), labels.to(device)
    
    optimizer.zero_grad()
    
    output = model.forward(images)
    conf_matrix = confusion_matrix(output, labels, conf_matrix)
    p = torch.nn.functional.softmax(output, dim=1)
    prediction = torch.argmax(p, dim=1)
    #loss = torch.nn.functional.nll_loss(torch.log(p), y)
    loss = criterion(output, labels)
    loss.backward()
    optimizer.step()
    
    running_loss += loss.item()
    
  #if steps % print_every == 0:
  valid_loss = 0
  accuracy = 0
  conf_matrix = confusion_matrix(output, labels, conf_matrix)
  model.eval()

But, this is out of the loop, so will not work, can you guide me why this happening? Is it for the 1000 images? But why? Because there is the only dataloader_train, should return only the values of training dataset, right? I am not getting this.

ptrblck · April 10, 2019, 7:18pm

Just remove the print statement inside confusion_matrix.
Currently each call also print the current stats.
If you just want to print it at the end, past the print statements at the desired place (e.g. after the training loop).
Also, it looks like you are computing the training and validation stats together, which is a bad idea. Usually you would re-create the confusion matrix for each dataset.

Deb_Prakash_Chatterj · April 11, 2019, 1:31pm

If you just want to print it at the end, past the print statements at the desired place (e.g. after the training loop).

So, it will be like this right?

for epoch in range(epochs):
  
  running_loss = 0
  model.train()
  for images, labels in dataloader_train:
    
    #steps += 1
    images, labels = images.to(device), labels.to(device)
    
    optimizer.zero_grad()
    
    output = model.forward(images)
    conf_matrix = confusion_matrix(output, labels, conf_matrix)
    p = torch.nn.functional.softmax(output, dim=1)
    prediction = torch.argmax(p, dim=1)
    #loss = torch.nn.functional.nll_loss(torch.log(p), y)
    loss = criterion(output, labels)
    loss.backward()
    optimizer.step()
    
    running_loss += loss.item()
    
  #if steps % print_every == 0:
  valid_loss = 0
  accuracy = 0
  model.eval()
  #print(conf_matrix)(**NOT HERE RIGHT?**)
  for images, labels in dataloader_test:
    optimizer.zero_grad()
    with torch.no_grad():
       
      images, labels = images.to(device), labels.to(device)

      output = model.forward(images)
      p = torch.nn.functional.softmax(output, dim=1)
      prediction = torch.argmax(p, dim=1)
      loss = criterion(output, labels)
          
      valid_loss += loss.item()
          
      ps = torch.exp(output)
         
      top_p, top_class = ps.topk(1, dim = 1)
      equals = top_class == labels.view(*top_class.shape)
      accuracy += torch.mean(equals.type(torch.FloatTensor))
  
  print(conf_matrix)      
  print("Epoch: {}/{} " .format(epoch+1, epochs))
  print("Train loss: {:.4f}.. " .format(running_loss/len(dataloader_train)))
  print("Valid loss: {:.4f}.. " .format(valid_loss/len(dataloader_test)))
  print("Accuracy: {:.4f}.. " .format(accuracy/len(dataloader_test)))
  model.train()

Also, it looks like you are computing the training and validation stats together, which is a bad idea. Usually you would re-create the confusion matrix for each dataset.

So, do I have made two different functions for confusion matrix? If not then what should I do? I have stopped all the print statements in the def Confusion_matrix. Thanks.

ptrblck · April 11, 2019, 6:22pm

You should create a confusion matrix for each dataset, i.e. create the training confusion matrix right after the for epoch loop and print it after the DataLoader loop, then create a validation confusion matrix after model.eval() and print it after the validation DataLoader loop.

Deb_Prakash_Chatterj · April 11, 2019, 8:22pm

Okay, will do that now, I got it. Thanks a lot.

Deb_Prakash_Chatterj · April 19, 2019, 6:35pm

Sorry to disturb you, @ptrblck, after so many days. But this part, I am not getting -

TP = conf_matrix.diag()
    for c in range(n_classes):
        idx = torch.ones(n_classes).byte()
        idx[c] = 0
        TN = conf_matrix[idx.nonzero()[:,None], idx.nonzero()].sum()
        FP = conf_matrix[c, idx].sum()
        FN = conf_matrix[idx, c].sum()

        sensitivity = (TP[c] / (TP[c]+FN))
        specificity = (TN / (TN+FP))

        print('Class {}\nTP {}, TN {}, FP {}, FN {}'.format(c, TP[c], TN, FP, FN))

Why are we using c for? c is the main pillar to print the TP here. Thank you.

ptrblck · April 19, 2019, 7:35pm

c is used as the index for the current class.
The metrics are given class-specific, i.e. each class will have a TP, TN, FP and FN metric.