I am training a region proposal network as in Faster R-CNN with a Resnet 101 backbone. My Issue is that the batch size is 1, since for each image you consider 256 proposals, half positive and half negative with the ground truth bounding boxes. Girschick calls this the “image-centric” batch method, although he does use a batch size of 8 with 8 GPU’s for MS Coco 2014.
My question is should I then remove the batch normalization layer (or deactivate it as in this post here).
What is the effect on a convolved image of batch normalization if the batch has size 1 ?
The other option is to modify the learning rate and increase the batch size, but I’d like to have flexibility to have whatever batch size I want.