Hello,

I am using a pretrained resnet50 to classify some images. My problem is that when I had, in the same training function, both model.train and model.eval, the accuracies where fine (about 65% train and validation accuracies) but when I tried to separate them and use different functions for each (one for the model.train and one for the model.eval), the validation accuracy dropped to 20% and it remains constant for each epoch. Does someone have an idea of what’s happening?

I quite new to all this and I don’t know why it behaves like that.

There can be many different causes of this (e.g., inadvertently using different transformations for the validation data vs. the training data). Can you post a code snippet of the evaluation functions?

Yes sure.

The transforamations I used are these ones:

```
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor()]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor()])
}
```

**The functions:**

```
def train_model(model, dataloaders, criterion, optimizer, scheduler, batch_size=5, num_epochs=10):#
since = time.time()
val_acc_history = []
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
#pdb.set_trace()
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train']:#, 'val']:
if phase == 'train':
model.train() # Set model to training mode
running_loss = 0.0
running_corrects = 0
average_precis_train = 0.001
average_precis_train_per_class = 0.001
loss_values = []
gr_truth_array = np.array([]) #convet to int dtype
preds_array = np.array([])
gr_truth_array = gr_truth_array.astype(int)
preds_array = preds_array.astype(int)
average_precision_array = np.array([]).astype(float)
print('Iterating over data:')
for batch_idx, (inputs, labels) in enumerate(dataloaders[phase]):
inputs = inputs.to(device)
labels = labels.to(device).float()
gt_data = labels
gt_data = gt_data.to(device)
gt_data = gt_data.cpu().data.numpy()
#average_precision_array = []
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
#pdb.set_trace()
if phase == 'train':
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
outputs = outputs.cpu()#.data.numpy()
preds = outputs.cpu().data.numpy()
preds = np.round(preds) #set a condition for binary
preds_int = preds.astype(int)
gt_data_np = np.round(gt_data)
gt_data_int = gt_data_np.astype(int)
gt_data = torch.from_numpy(gt_data_np)
loss = criterion(outputs, gt_data)
gr_truth_array = np.append(gr_truth_array, gt_data_int)
preds_array = np.append(preds_array ,preds_int)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
gr_truth_array = np.reshape(gr_truth_array, (-1, 40))
preds_array = np.reshape(preds_array, (-1, 40))
running_loss += loss.item() * inputs.size(0)
running_corrects += f1_score(gt_data, preds, average="samples")
if phase == 'train':
scheduler.step()
average_precis_train += average_precision_score(gr_truth_array, preds_array, average= "macro")
average_precis_train_per_class += average_precision_score(gr_truth_array, preds_array, average=None)
average_precision_array = np.append(average_precision_array, average_precis_train_per_class)
#pdb.set_trace()
av_precis_array = [j for i in zip(average_precision_array, attributes) for j in i]
av_precis_array = np.array(av_precis_array)
print("Average precision Training:", average_precis_train)
print("Average precision per Class Training:", av_precis_array)
#pdb.set_trace()
epoch_loss = running_loss / len(dataloaders[phase].dataset)
epoch_acc = running_corrects / len(dataloaders[phase].dataset) #running_corrects.float()
epoch_acc = np.round(epoch_acc, decimals=4)
print('{} Loss: {:.4f}'.format(phase, epoch_loss))
print("Acc:", epoch_acc)
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
model.load_state_dict(best_model_wts)
return model, val_acc_history
```

The evaluation is almost the same but the model is set to model.eval and I use `with torch.no_grad():`

instead of set_grad_enabled

I see the condition for `model.train()`

statement in the code but it looks like `model.eval()`

doesn’t have a corresponding branch?

Ok can you explain a bit more? Is this what is causing this?

I’m not sure this is the issue yet, but I don’t see `model.eval()`

anywhere in the code you posted, just `model.train()`

.

(post deleted by author)

Have you inspected the outputs of the model to see if they behave strangely during validation? For example, are they stuck at the same output (or the same class) for every example? Does the validation accuracy change at all between epochs?

I will and I will let you know.

The accuracy stays the same in every epoch

What happens when you remove the `model.load_state_dict(best_model_wts)`

? It looks like the best model is never updated so this may just return the same model every iteration.

took it out but didn’t work. Nothing changes

The accuracy stays the same again

Ok, then can you verify the data is changing along with the model predictions during validation? Or are the predictions the same regardless of the input?

It seems that the outputs change with every iteration, so I guess there is no issue there

You might want to also add a sanity check that the model parameters are changing between validation epochs.

Can you tell me how to do that? Maybe give me an example or something?

This code gives an example of how to count the number of parameters in the model.

How do I check the number of parameters of a model? - PyTorch Forums

If you want to check that the parameters are changing, you can try printing the sum of the parameters rather than the count and see if this is changing between training epochs.

Thank you very much. I’ll try it tomorrow

Hello again,

So in this line of code

`def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad)`

since it returns the sum of the parameters, I should only take out the `numel()`

in order to get the sum right?

Something like that. You might need to do a second sum if you end up with just a list of summed parameters for each layer (or you can just compare them directly if the ordering is the same).