Doubt in the code of Faster RCNN implemenation on GitHub

I am trying to implement Faster RCNN for Object Detection. I am following this particular GitHub repo for implementation. However, I have a doubt from this particular line in the code and just want to clarify if I have understood things correctly. There is block of code following this line where a few layers have been fixed by setting requires_grad to False. However, in this particular line in the train function, the base feature extractor is set to eval() but then again the following 2 lines train some specific layers.

I have 2 main doubts,

  1. Why is requires_grad set to False in the preceding few lines if it is anyway going to be used in evaluation mode?
  2. What effect do the 2 lines below the self.FRCNN_base.eval() do in terms of training in the def train() function?

Any help would be appreciated. Thanks

It seems that is using a pretrained model to do transfer learning/fine tuning, so you don’t want to train early layers of the net (in this case the R-CNN) but classification/regression layers.

  1. requires_grad = False means you are freezing those layers in training, and model.eval() is to prepare the batchnorm/dropout layers to act in evaluation mode, so isn’t the same.

  2. I guess it’s to put classification and regression layers into training mode.

@simaiden, thanks for the reply. But If you might have observed in the lines 266 and 283, there is a separate function which is being called to set the batch norm layers in eval mode. Why is this being done if the whole model is anyway being set to eval mode ?

Also, are the other layers such as self.RCNN_cls_score and self.RCNN_bbox_pred in lines 243 and 245 respectively being trained? I mean, they ideally should be, because otherwise, I don’t think we would have good results(please correct me if I am wrong). However, I got this doubt because these layers are not explicitly set to training mode in the def train() function.

The RCNN_cls_score and RCNN_bbox_pred should be trained, as I cannot see where their requires_grad attribute is manipulated.

As @simaiden said, the eval() and train() calls only change the behavior of batchnorm and dropout layers. They are not freezing the layers.

Okay, so this is what I feel I have understood from the above discussion (please correct me if I am wrong)

  1. The only difference between eval() and train() is the effect they have on the BatchNorm and Dropout layers. Otherwise, if the requires_grad() attribute of the layers is set to True, those layers will be trained and backprop will still happen for those layers irrespective of whether they are set to eval() or train()

  2. Now coming to the code in the repo, in the def train() function, since the self.RCNN_base is anyway set to eval(), am I correct in assuming that the following line of code self.RCNN_base.apply(set_bn_eval) in line 283 is redundant since the Batchnorm Layers are already set to eval inside the if condition at line 272?

  1. That’s correct. Note that other (custom) layers might also change their behavior via train()/eval(), if they use the self.training argument internally.

  2. Not necessarily. L274 sets the base to eval(), while L275+ sets the 5th and 6th module back to train. Afterwards all batchnorm layers are reset to back to eval() (in particular, if RCNN_base[5] or RCNN_base[6] contain batchnorm layers).

@ptrblck, thanks for the clarification. How about if requires_grad is set to False ? Will setting model.train() or model.eval() (in terms of Batchnorm and Dropout Layers) make a difference for those layers whose parameters have requires_grad set to False ?

And could you also mention if there are any other layers apart fromBatchNorm and Dropout whose behaviour changes upon using .train() and .eval() ? Thanks!

Yes, eval()/train() do not change anything about the gradients and if parameters require gradients, but change the behavior (code paths) of some layers.

If I’m not mistaken, batchnorm and dropout layers are the only layers at the moment in the PyTorch core, which change the behavior.
However, custom layers (e.g. if you are using a specific repository from a 3rd party) can easily use the self.training flag to change the layer.

You mean to say eval() and train() don’t change anything if the parameters have requires_grad set to False right ? And could you also give examples of where the behaviour might change (as you mentioned in the answer)?

No. train() and eval() are completely independent of the requires_grad attribute of any parameters.
These methods change the self.training attribute of the model to True or False, which is then used to trigger different code paths.

Dropout will be disabled when eval() is used and batchnorm layers will use their running stats to normalize the input activation instead of the batch stats.

@ptrblck, Thank you for the clarification, I am really sorry to keep troubling you with a barrage of questions :sweat_smile:. I actually have a few more doubts about BatchNorm layers in PyTorch and its behavior. So can we have that discussion in this post itself or is there any specific post (I have found quite a few about BatchNorm!) I can post my questions on?
Or should I open a new topic altogether? Thanks!

If it’s unrelated to the original question, please open a new topic and I’ll have a look. :slight_smile:

@ptrblck, I have made a new topic regarding my issue. Here is the link

Thanks!