I followed the example ImageNet to complete a project for semantic segmentation task.
the code structure as follows:
for epoch in range(start_epoch, epochs+1):
for steps, (data, target) in enumerate(train_data_loader):
loss and metrics
for steps,(data, target) in enumerate(valid_data_loader):
loss and metrics
The problem use
model.eval() the metrics are almost zero and loss is very high. But when I remove
model.eval() the metrics and loss seems normal. And in my network there exists
Dropout layers so I think the
model.eval() is necessary. So what can be the problem is?
Thanks in advance.
Does the validation data come from approx. the same domain as the train data, e.g. are both “natural” images?
Are you preprocessing the validation data in the same way as the train data, e.g. normalization etc.?
Hi, thanks for your reply.
The validation data com from the same domain as the train data. But for transfroms, I do resize, vflip, hflip, and normalization on train dataset, and just resize on validation dataset and test dataset.
Does the validation dataset need normalization like training data? use means and stds calculated on train dataset?
I tried to add normalization to validation dataset use the mean and std from train dataset, it works. But why should we use normalization on validation dataset, in my opinion, the validation dataset and test dataset are just to measure performance of the model
You are right in that we use the validation and test data to measure the performance of the model. However, normalization (with the training stats) does not rule out our intention.
The model you’ve trained has learned on normalized input (e.g. values between 0 and 1). In the validation case the model would receive completely “unknown” inputs in another range (e.g. 0 to 255 for uint8 images). This will often yield to a constant output of your model, since the parameters “saturate” on the input range.
You are very kind!
Can I interpret the
unknow inputs to completely different data distribution from train data, and how to understand the parameters “saturate”.
Thanks a lot
Yes, I would interpret these samples in a different range as samples from another data distribution.
By saturation I mean that the parameters of your model were trained to create useful features using a normalized input. If you provide now much larger input values, the magnitude of your activation will be most likely much higher, thus pushing your predictions to a certain constant value. At least that was my observation.