Model .eval() problem

Hi,
I met a strange bug:

My model: EfficientDet-D4 (following this repo)

While training the model, I use model.train(), then change it to model.eval() in validate step and it worked normally.

However, in the test phase, my code is:

from efficientdet.model import Classifier
model = EfficientDetBackbone(num_classes=len(params.obj_list), compound_coef=4,
                                 ratios=eval(params.anchors_ratios), scales=eval(params.anchors_scales))
model.backbone_net.model._conv_stem.conv = nn.Conv2d(4, 48, kernel_size=(3, 3), stride=(2, 2), bias=False)
model.classifier.header.pointwise_conv.conv = nn.Conv2d(224, 9, kernel_size=(1, 1), stride=(1, 1))
model.classifier = Classifier(in_channels=model.fpn_num_filters[4], num_anchors=model.num_anchors,
                            num_classes=1,
                            num_layers=model.box_class_repeats[4],
                            pyramid_levels=model.pyramid_levels[4])
model.load_state_dict(torch.load(weights_path),strict=False)
model.requires_grad_(False)
model.eval()

with torch.no_grad():
    for iter, data in enumerate(test_generator):
        imgs = data['img']
        _, regression, classification, anchors = model(imgs)

The result was really bad, it always returned the bounding boxes in the corner of the image, but when I commented out the line model.eval(), the result was really good. Any idea why is it?

model.eval() disables the dropout and uses the internal running stats in all batchnorm layers (also custom modules might change their behavior using the self.training flag, which is changed by calling eval()).
If your validation loss and predictions are bad, this might point to bad running estimates in the batchnorm layers. You would see it if e.g. the training and validation data are not processed in the same way (especially the normalization) or if they are sampled from another data domain and thus have different statistics.
To counter this effect, you could play around with the momentum of the batchnorm layers to smooth the statistics a bit.
Also, if not already used, you should shuffle the training dataset to avoid adding a bias for the stats updates based on the last samples in your training dataset.

1 Like

Thank you for your reply,

I am sure that the data processing is the same, and I already shuffle data while training.

I think you make sense of the batch norm problem. I will check the running_mean and running_variance, then try to do something with momentum. However, I don’t think it to have that much of an impact.

This is the result with model.train()

{'rois': array([[448.55865, 435.95306, 648.1225 , 714.6619 ]], dtype=float32),
 'class_ids': array([0], dtype=int64), 
'scores': array([0.9849996], dtype=float32)} 

And this is the result with model.train(False):

{'rois': array([[   0.      ,  921.97406 ,  152.82751 , 1023.      ],
       [ 929.1798  ,  921.1621  , 1023.      , 1023.      ],
       [   0.      ,    0.      ,  159.41663 ,   97.29772 ],
       [ 874.1246  ,    0.      , 1023.      ,   99.22618 ],
       [ 719.88806 ,  820.15857 , 1023.      , 1023.      ],
       [   0.      ,    0.      ,   28.660963,   59.322357],
       [ 991.6581  ,  965.65967 , 1023.      , 1023.      ]],
      dtype=float32),
 'class_ids': array([0, 0, 0, 0, 0, 0, 0], dtype=int64), 
'scores': array([0.22587009, 0.22511227, 0.21677688, 0.2106454 , 0.20411198,
       0.20127359, 0.20020929], dtype=float32)}

Nice collection keep up the good work.

1 Like