Understanding Normalization and Getting better results from training

I’ve been working on building a facial detection NN through fine-tuning the pretrained Faster RCNN. I have been able to get boxes to populate in the general area of faces, but it is very hit or miss. I thought that a normalization transformation would give me better results but when I did the transformation my images come out like this:

Here’s what I get when I do not do my own normalization:

My thought is, maybe the normalization transform is being called twice? Once in the Faster RCNN source code and then once by me. The reason I think it is being called twice is because I received a tensor mismatch error, when I tried changing my images to gray scale when I loaded them in. This same exact error occurred both with and without my Normalization transform.

Here’s the error:

Here’s my transforms:

Firstly regarding the transform, FasterRCNN has a GeneralizedRCNNTransform module in the beginning which normalizes the input image.

Do you feed 3-channel(RGB) images as input or do you convert them into grayscale? FasterRCNN expects a 3-channel input image.

I feed it 3 channel images