Loss and metrics are unusual use model.eval() in valid phase

MariosOreo · January 27, 2019, 5:58am

I followed the example ImageNet to complete a project for semantic segmentation task.
the code structure as follows:

def main()
... 
    for epoch in range(start_epoch, epochs+1):
        _train_epoch(epoch)
        _valid_epoch(epoch)
        ... ...
def _train_epoch(epcoh):
    ... ... 
    model.train()
    for steps, (data, target) in enumerate(train_data_loader):
         ... ...
         loss and metrics
def _valid_epoch(epoch):
    ... ...
    model.eval()
    with torch.no_grad():
        for steps,(data, target) in enumerate(valid_data_loader):
            ... ...
            loss and metrics

The problem use model.eval() the metrics are almost zero and loss is very high. But when I remove model.eval() the metrics and loss seems normal. And in my network there exists BatchNorm and Dropout layers so I think the model.eval() is necessary. So what can be the problem is?

Thanks in advance.

ptrblck · January 27, 2019, 6:20am

Does the validation data come from approx. the same domain as the train data, e.g. are both “natural” images?
Are you preprocessing the validation data in the same way as the train data, e.g. normalization etc.?

MariosOreo · January 27, 2019, 6:26am

Hi, thanks for your reply.

The validation data com from the same domain as the train data. But for transfroms, I do resize, vflip, hflip, and normalization on train dataset, and just resize on validation dataset and test dataset.
Does the validation dataset need normalization like training data? use means and stds calculated on train dataset?

MariosOreo · January 27, 2019, 6:42am

I tried to add normalization to validation dataset use the mean and std from train dataset, it works. But why should we use normalization on validation dataset, in my opinion, the validation dataset and test dataset are just to measure performance of the model

ptrblck · January 27, 2019, 7:07am

You are right in that we use the validation and test data to measure the performance of the model. However, normalization (with the training stats) does not rule out our intention.
The model you’ve trained has learned on normalized input (e.g. values between 0 and 1). In the validation case the model would receive completely “unknown” inputs in another range (e.g. 0 to 255 for uint8 images). This will often yield to a constant output of your model, since the parameters “saturate” on the input range.

MariosOreo · January 27, 2019, 7:14am

You are very kind!
Can I interpret the unknow inputs to completely different data distribution from train data, and how to understand the parameters “saturate”.

Thanks a lot

ptrblck · January 27, 2019, 1:40pm

Yes, I would interpret these samples in a different range as samples from another data distribution.
By saturation I mean that the parameters of your model were trained to create useful features using a normalized input. If you provide now much larger input values, the magnitude of your activation will be most likely much higher, thus pushing your predictions to a certain constant value. At least that was my observation.