I am trying to implement Faster RCNN for Object Detection. I am following this particular GitHub repo for implementation. However, I have a doubt from this particular line in the code and just want to clarify if I have understood things correctly. There is block of code following this line where a few layers have been fixed by setting
False. However, in this particular line in the train function, the base feature extractor is set to
eval() but then again the following 2 lines train some specific layers.
I have 2 main doubts,
- Why is
requires_grad set to
False in the preceding few lines if it is anyway going to be used in evaluation mode?
- What effect do the 2 lines below the
self.FRCNN_base.eval() do in terms of training in the
def train() function?
Any help would be appreciated. Thanks
It seems that is using a pretrained model to do transfer learning/fine tuning, so you don’t want to train early layers of the net (in this case the R-CNN) but classification/regression layers.
requires_grad = False means you are freezing those layers in training, and model.eval() is to prepare the batchnorm/dropout layers to act in evaluation mode, so isn’t the same.
I guess it’s to put classification and regression layers into training mode.
@simaiden, thanks for the reply. But If you might have observed in the lines 266 and 283, there is a separate function which is being called to set the batch norm layers in eval mode. Why is this being done if the whole model is anyway being set to eval mode ?
Also, are the other layers such as
self.RCNN_bbox_pred in lines 243 and 245 respectively being trained? I mean, they ideally should be, because otherwise, I don’t think we would have good results(please correct me if I am wrong). However, I got this doubt because these layers are not explicitly set to
training mode in the
def train() function.
RCNN_bbox_pred should be trained, as I cannot see where their
requires_grad attribute is manipulated.
As @simaiden said, the
train() calls only change the behavior of batchnorm and dropout layers. They are not freezing the layers.
Okay, so this is what I feel I have understood from the above discussion (please correct me if I am wrong)
The only difference between
train() is the effect they have on the
Dropout layers. Otherwise, if the
requires_grad() attribute of the layers is set to
True, those layers will be trained and backprop will still happen for those layers irrespective of whether they are set to
Now coming to the code in the repo, in the
def train() function, since the
self.RCNN_base is anyway set to
eval(), am I correct in assuming that the following line of code
self.RCNN_base.apply(set_bn_eval) in line 283 is redundant since the
Batchnorm Layers are already set to eval inside the
if condition at line 272?
@ptrblck, thanks for the clarification. How about if
requires_grad is set to
False ? Will setting
model.eval() (in terms of Batchnorm and Dropout Layers) make a difference for those layers whose parameters have
requires_grad set to
And could you also mention if there are any other layers apart from
Dropout whose behaviour changes upon using
.eval() ? Thanks!
eval()/train() do not change anything about the gradients and if parameters require gradients, but change the behavior (code paths) of some layers.
If I’m not mistaken, batchnorm and dropout layers are the only layers at the moment in the PyTorch core, which change the behavior.
However, custom layers (e.g. if you are using a specific repository from a 3rd party) can easily use the
self.training flag to change the layer.
You mean to say eval() and train() don’t change anything if the parameters have
requires_grad set to
False right ? And could you also give examples of where the behaviour might change (as you mentioned in the answer)?
eval() are completely independent of the
requires_grad attribute of any parameters.
These methods change the
self.training attribute of the model to
False, which is then used to trigger different code paths.
Dropout will be disabled when
eval() is used and batchnorm layers will use their running stats to normalize the input activation instead of the batch stats.
@ptrblck, Thank you for the clarification, I am really sorry to keep troubling you with a barrage of questions . I actually have a few more doubts about
BatchNorm layers in PyTorch and its behavior. So can we have that discussion in this post itself or is there any specific post (I have found quite a few about BatchNorm!) I can post my questions on?
Or should I open a new topic altogether? Thanks!
If it’s unrelated to the original question, please open a new topic and I’ll have a look.
@ptrblck, I have made a new topic regarding my issue. Here is the link