I have a very very mysterious question about semantic segmentiaon

I had trained a model to semantic segmentiaon ,and I got a group parameters that is not bad.
When I use ‘model.eva()’ and reload this parameters,and send testset in to this model,I will got nothing except a fully dark picture.
But when I set lr=0.0,and I send testset in to train stage,I will got a not bad picture.This is amazing.
I had checked the input this is no worry.The question is the output.So what happen in the model when I use model.eval()
def evaluate( model):

model.eval()
model.load_state_dict(torch.load('segnet-025-0000.pth'))

loader = DataLoader(test_set('data', input_transform, target_transform),
    num_workers=4, batch_size=1, shuffle=False)

for i ,(image,path) in enumerate(loader):
    image=Variable(image).cuda()

    outputs=model(image) ####this is something wrong 

    out = color_transform(outputs[0].data.max(0)[1])        
    image=image_transform(out)   
  
    root=(path[0])            
    image.save(root+'.png')
    print(i)

Using model.eval sets all nn.Modules to evaluation, i.e. some layers like BatchNorm and Dropout change their operation.
BatchNorm layers for example don’t calculate the batch statistics anymore and thus don’t update the running stats. They just apply the already calculated running stats on the test sample.

If you call your model with some test samples using a learning rate of 0, the running stats in each BatchNorm layer will be updated using the test set. This might yield better results, if your data domains of training and test differ.
However, I wouldn’t recommend this approach for a model using some kind of accuracy metrics, since it’s data leaking in my opinion. It’s not really bad for a generative model in my opinion, but that’s probably another topic. Maybe you could play around with the momentum argument of BatchNorm to counter this effect.
Also, what is your batch size? It is usually advised to use a batch size of >=64.
If you have a smaller batch size, you might want to have a look at nn.GroupNorm (in master) or nn.InstanceNorm.

Just now.When I run my code on evaluation state.,I do not use .eval().
And I print((color_transform(outputs[0].data.max(0)[1])).sum() ) , it print a very big tensor.

But when I set model.eval(), print((color_transform(outputs[0].data.max(0)[1]) ).sum()) , it will print tensor(0)
It mens that the every elements on 0 channel is a maximal tensor in all of channel

whaterver the dataset is train set or test set.The state is same

I set batch=4, I will use nn.Groupnorm.But I think it is not a main question in my trouble

Hellow,I am sure the trouble is .eval()
I make a little trick for convenience that I download the source of torchvision.models .And I remove the nn.MaxPool2d() in vgg_16(pretrained=True) model,then I use F.maxpool2d() in forward process. I guess this is the root of my trouble when I use .eval().May be?

Max pooling isn’t change in eval and train, so as long as you can load the model, you should be fine.
I still think it’s due to a bad estimate of the running stats, because you get a better result “retraining” only the batch norm layers.

sorry,I do not understand you said ‘the batch norm layers’

Tomorrow ,I will install the master version of pytorch.Then I will try nn.GroupNorm

If I set lr=0.0, and I send test data to train stage, I will get a better results than test on eval stage?

I assumed your model has BatchNorm layers in it.
If you used the vgg16 model with out BatchNorm, then your observation is indeed strange and it should be caused by another effect.

Usually this should not be the case.
It might happen, for the reason I explained in my first answer:

However, if you don’t have any BatchNorm layers, it would be interesting so see a small code snippet to help debugging this issue. :wink:

thank you …

hellow ,I have use the nn.GroupNorm but the result is bad.
For example,slowly convergnece,hard convergeence,I do not know the reason,I set batch=1.

I try nn.Batchnorm again,it make model convergence fastly. And I will use model.train() when I run the model on the test set,then nn.Batchnom will not use the statical mean and var ,it will use moving average on test set .And I do this ,is this cheating?

If you would like to always use the batch stats, you can just disable the running stats with track_running_stats=False.
It won’t be cheating, but there might be some shortcomings.
Using another batch size in the test case as in training might lead to worse accuracy.
If your batch size is > 1, your output won’t be deterministic, since the prediction of one sample depend on the batch it is predicted with. Shuffling the test set and predicting it again, might also lead to another accuracy.

If you can handle these effects or they won’t occur, because your batch size is 1, I wounldn’t see it as cheating.

Thank you for your help

@mshmoon I have a similar problem, I think it is because during training in the validation phase the model is not set to model.eval() . So in this problem, you should retrain it in the .eval() mode while validating, and .train() mode while training. Then in the test or inference stage, you just use model.eval() . Now the mysterious one is what is the difference of model.train(True) and model.train(False) do. I have posted it in here but no ones reply it.

From the other thread: