Loss is increasing and accuracy is decreasing

Lucky_Magna · May 22, 2021, 3:37pm

I’m trying to train a Pneumonia classifier using Resnet34. While training the model, the loss is increasing and accuracy is decreasing drastically (both in training and validation sets). What might be the potential reason behind this?

def train(model, dataloaders, loss, optimizer, epochs=5):

  train = dataloaders['train']

  valid = dataloaders['valid']

  device = 'cuda' if torch.cuda.is_available() else 'cpu'

  metric = Accuracy().to(device)

  for epoch in tqdm(range(epochs), desc="EPOCHS : "):

    model.train()

    cst = 0

    for x, y in tqdm(train, leave=True, desc="Trainning : "):

      optimizer.zero_grad()

      x = x.to(device)

      y = y.to(device)

      preds = model(x).to(device)

      acc = metric(preds.argmax(dim=1), y)

      cost = loss(preds, y)

      cst += cost.item()

      cost.backward()

      optimizer.step()

    acc = metric.compute()

    cst /= len(train)

    print(f'Train loss : {cst} \t Train acc : {acc}')

    model.eval()

    cst = 0

    for x, y in tqdm(valid, leave=True, desc="Validation : "):

      x = x.to(device)

      y = y.to(device)

      preds = model(x).to(device)

      acc = metric(preds.argmax(dim=1), y)

      cost = loss(preds, y)

      cst += cost.item()

    acc = metric.compute()

    cst /= len(valid)

    print(f'Valid loss : {cst} \t Valid acc : {acc}')

  return model

model = models.resnet34(pretrained=True)

for param in model.parameters():

  param.requires_grad = False

model.fc = nn.Sequential(

    nn.Dropout(p=.7),

    nn.Linear(in_features=model.fc.in_features, out_features=2),

    nn.LogSoftmax(dim=1)

)

model = model.to(device)

LR = 3e-3

WD = 1e-4

loss = nn.NLLLoss()

optimizer = optim.Adam(model.parameters(), lr=LR, weight_decay=WD)

md = train(model, dataloaders, loss, optimizer, epochs=5)

krishna511 · May 22, 2021, 3:46pm

Well, the obvious answer is, nothing wrong here, if the model is not suited for your data distribution then, it simply won’t work for desirable results. And another thing is I think you should reframe your question If loss increase then certainly acc will decrease.
That’s just my opinion, I may not be to the point here.

Lucky_Magna · May 22, 2021, 4:17pm

I tried different architectures as well, but the result is the same. And I don’t think I should reframe the question, as you can see from the screenshot.

krishna511 · May 22, 2021, 4:20pm

@Lucky_Magna By reframing I meant this is obvious if loss decrease acc will increase.

Lucky_Magna · May 22, 2021, 4:22pm

Can you suggest any other solution to solve the problem.

eqy · May 22, 2021, 11:26pm

Can you check the initial loss of your model with random data? It should be around -ln(1/num_classes). If this value is close then it suggests that your model is initialized properly. The next thing to check would be that your data format as input to the model makes sense (e.g., from the perspective of data layout, etc.)

From here, if your loss is not even going down initially, you can try simple tricks like decreasing the learning rate until it starts training. If the loss is going down initially but stops improving later, you can try things like more aggressive data augmentation or other regularization techniques.

Lucky_Magna · May 23, 2021, 2:53am

@eqy Will try it and let you know.

Lucky_Magna · May 23, 2021, 3:17am

@eqy Loss of the model with random data is very close to -ln(1/num_classes), as you mentioned. As for the data, it is in the right format.

eqy · May 23, 2021, 3:43am

Great, what does the loss curve look like with smaller learning rates?

Lucky_Magna · May 23, 2021, 3:50am

@eqy I changed the model from resnet34 to renset18. The loss is stable, but the model is learning very slowly. The accuracy is starting from around 25% and raising eventually but in a very slow manner. It is taking around 10 to 15 epochs to reach 60% accuracy. I tried increasing the learning_rate, but the results don’t differ that much.

eqy · May 23, 2021, 4:34am

Ok, that sounds normal. At this point I would see if there are any data augmentations that you can apply that make sense for you dataset, as well as other model architectures, etc.

Lucky_Magna · May 23, 2021, 4:49am

@eqy Ok let me explain about the project I’m working on. I’m trying to classify Pneumonia patients using X-ray copies. Below mentioned are the transforms I’m currently using.

transform = {

   'train' : T.Compose([

                        T.Resize(size=(224,224)),

                        T.RandomAffine(30),

                        T.RandomInvert(p=1),

                        T.RandomHorizontalFlip(),

                        T.ToTensor(),

                        T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

   ]),

   'valid' : T.Compose([

                        T.Resize(size=(224,224)),

                        T.RandomInvert(p=1),

                        T.ToTensor(),

                        T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

   ])

}

Before you may ask why am I using Invert transform on the validation set, I think this transform is able to capture the pneumonia parts in the x-ray copies. So, I used it on validation and test set as well (If it is a bad idea the correct me). After applying the transforms the images look something like this:

Lucky_Magna · May 25, 2021, 4:53am

@eqy Solved it! I forgot to shuffle the dataset. It is overfitting to one class in the whole dataset. Thanks for the help though.

Nahil_Sobh · May 29, 2021, 11:27am

Nice.
@Lucky_Magna Could you please share the performance of your final model?
Like the training and validation losses plots and possibly accuracy plots as well.

Thx.

Lucky_Magna · May 29, 2021, 4:42pm

@Nahil_Sobh I posted the code on my github account you can see the performance there.

github.com

narayana8799/Pneumonia-Detection-using-Pytorch/blob/master/Pneumonia Detection.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "764e6867-8a51-4090-b806-07a176c73fe6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "_CudaDeviceProperties(name='NVIDIA GeForce GTX 1050 Ti', major=6, minor=1, total_memory=4096MB, multi_processor_count=6)"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [

This file has been truncated. show original

Nahil_Sobh · May 29, 2021, 5:30pm

Thanks for sharing .

Lucky_Magna · May 29, 2021, 5:32pm

@Nahil_Sobh Share your model performance once you have optimized it.
Thank you.