Model.eval() accuracy is low

Anto_Skar · June 9, 2021, 7:32pm

Hello,
I am using a pretrained resnet50 to classify some images. My problem is that when I had, in the same training function, both model.train and model.eval, the accuracies where fine (about 65% train and validation accuracies) but when I tried to separate them and use different functions for each (one for the model.train and one for the model.eval), the validation accuracy dropped to 20% and it remains constant for each epoch. Does someone have an idea of what’s happening?
I quite new to all this and I don’t know why it behaves like that.

eqy · June 9, 2021, 7:37pm

There can be many different causes of this (e.g., inadvertently using different transformations for the validation data vs. the training data). Can you post a code snippet of the evaluation functions?

Anto_Skar · June 9, 2021, 8:05pm

Yes sure.
The transforamations I used are these ones:

data_transforms = {

    'train': transforms.Compose([

        transforms.RandomResizedCrop(224),

        transforms.RandomHorizontalFlip(),

        transforms.ToTensor()]),

    'val': transforms.Compose([

        transforms.Resize(256),

        transforms.CenterCrop(224),

        transforms.ToTensor()])

}

The functions:

def train_model(model, dataloaders, criterion, optimizer, scheduler, batch_size=5, num_epochs=10):#

    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())

    best_acc = 0.0

    #pdb.set_trace()

    for epoch in range(num_epochs):

        print('Epoch {}/{}'.format(epoch, num_epochs - 1))

        print('-' * 10)

        # Each epoch has a training and validation phase

        for phase in ['train']:#, 'val']:

            if phase == 'train':

                model.train()  # Set model to training mode

            running_loss = 0.0

            running_corrects = 0

            average_precis_train = 0.001

            average_precis_train_per_class = 0.001

            loss_values = []

            gr_truth_array = np.array([]) #convet to int dtype

            preds_array = np.array([])

            gr_truth_array = gr_truth_array.astype(int)

            preds_array = preds_array.astype(int) 

            average_precision_array = np.array([]).astype(float)


            print('Iterating over data:')

            for batch_idx, (inputs, labels) in enumerate(dataloaders[phase]):

                inputs = inputs.to(device)

                labels = labels.to(device).float()

                gt_data = labels

                gt_data = gt_data.to(device)

                gt_data = gt_data.cpu().data.numpy()

                #average_precision_array = []

                # zero the parameter gradients

                optimizer.zero_grad()

                # forward

                # track history if only in train

                #pdb.set_trace()

                if phase == 'train':

                  with torch.set_grad_enabled(phase == 'train'):

                      outputs = model(inputs)

                      outputs = outputs.cpu()#.data.numpy()

                      preds = outputs.cpu().data.numpy()

                      preds = np.round(preds) #set a condition for binary

                      preds_int = preds.astype(int)

                      gt_data_np = np.round(gt_data)

                      gt_data_int = gt_data_np.astype(int)

                      gt_data = torch.from_numpy(gt_data_np)

                      loss = criterion(outputs, gt_data)

                      gr_truth_array = np.append(gr_truth_array, gt_data_int)

                      preds_array = np.append(preds_array ,preds_int)

                    # backward + optimize only if in training phase

                      if phase == 'train':

                          loss.backward()

                          optimizer.step()

                # statistics

                  gr_truth_array = np.reshape(gr_truth_array, (-1, 40))

                  preds_array = np.reshape(preds_array, (-1, 40))

                  running_loss += loss.item() * inputs.size(0)

                  running_corrects += f1_score(gt_data, preds, average="samples")

                

            if phase == 'train':

                scheduler.step()

                average_precis_train += average_precision_score(gr_truth_array, preds_array, average= "macro")

                average_precis_train_per_class += average_precision_score(gr_truth_array, preds_array, average=None)

                average_precision_array = np.append(average_precision_array, average_precis_train_per_class)

                #pdb.set_trace()

                av_precis_array = [j for i in zip(average_precision_array, attributes) for j in i]

                av_precis_array = np.array(av_precis_array)

                print("Average precision Training:", average_precis_train)      

                print("Average precision per Class Training:", av_precis_array)

            #pdb.set_trace()   

            epoch_loss = running_loss / len(dataloaders[phase].dataset)

            epoch_acc = running_corrects / len(dataloaders[phase].dataset) #running_corrects.float()

            epoch_acc = np.round(epoch_acc, decimals=4)

             

            print('{} Loss: {:.4f}'.format(phase, epoch_loss))

            print("Acc:", epoch_acc)

 

    time_elapsed = time.time() - since

    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

    model.load_state_dict(best_model_wts)

    return model, val_acc_history

The evaluation is almost the same but the model is set to model.eval and I use with torch.no_grad(): instead of set_grad_enabled

eqy · June 9, 2021, 8:12pm

I see the condition for model.train() statement in the code but it looks like model.eval() doesn’t have a corresponding branch?

Anto_Skar · June 9, 2021, 8:24pm

Ok can you explain a bit more? Is this what is causing this?

eqy · June 9, 2021, 8:27pm

I’m not sure this is the issue yet, but I don’t see model.eval() anywhere in the code you posted, just model.train().

eqy · June 9, 2021, 9:10pm

Have you inspected the outputs of the model to see if they behave strangely during validation? For example, are they stuck at the same output (or the same class) for every example? Does the validation accuracy change at all between epochs?

Anto_Skar · June 9, 2021, 9:22pm

I will and I will let you know.

Anto_Skar · June 9, 2021, 9:32pm

The accuracy stays the same in every epoch

eqy · June 9, 2021, 9:35pm

What happens when you remove the model.load_state_dict(best_model_wts)? It looks like the best model is never updated so this may just return the same model every iteration.

Anto_Skar · June 9, 2021, 10:13pm

took it out but didn’t work. Nothing changes
The accuracy stays the same again

eqy · June 9, 2021, 11:11pm

Ok, then can you verify the data is changing along with the model predictions during validation? Or are the predictions the same regardless of the input?

Anto_Skar · June 11, 2021, 12:41pm

It seems that the outputs change with every iteration, so I guess there is no issue there

eqy · June 11, 2021, 6:37pm

You might want to also add a sanity check that the model parameters are changing between validation epochs.

Anto_Skar · June 11, 2021, 9:11pm

Can you tell me how to do that? Maybe give me an example or something?

eqy · June 11, 2021, 9:37pm

This code gives an example of how to count the number of parameters in the model.
How do I check the number of parameters of a model? - PyTorch Forums
If you want to check that the parameters are changing, you can try printing the sum of the parameters rather than the count and see if this is changing between training epochs.

Anto_Skar · June 11, 2021, 9:49pm

Thank you very much. I’ll try it tomorrow

Anto_Skar · June 16, 2021, 2:24pm

Hello again,
So in this line of code
def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad)
since it returns the sum of the parameters, I should only take out the numel() in order to get the sum right?

eqy · June 16, 2021, 5:25pm

Something like that. You might need to do a second sum if you end up with just a list of summed parameters for each layer (or you can just compare them directly if the ordering is the same).

Anto_Skar · June 16, 2021, 6:56pm

Ok, because I got this error here

----> return sum(p for p in model.parameters() if p.requires_grad)

RuntimeError: The size of tensor a (7) must match the size of tensor b (64) at non-singleton dimension 3