Understanding Normalization and Getting better results from training

_Allen · December 1, 2019, 6:22pm

I’ve been working on building a facial detection NN through fine-tuning the pretrained Faster RCNN. I have been able to get boxes to populate in the general area of faces, but it is very hit or miss. I thought that a normalization transformation would give me better results but when I did the transformation my images come out like this:

Here’s what I get when I do not do my own normalization:

My thought is, maybe the normalization transform is being called twice? Once in the Faster RCNN source code and then once by me. The reason I think it is being called twice is because I received a tensor mismatch error, when I tried changing my images to gray scale when I loaded them in. This same exact error occurred both with and without my Normalization transform.

Here’s the error:

Here’s my transforms:

mailcorahul · December 2, 2019, 10:37am

Firstly regarding the transform, FasterRCNN has a GeneralizedRCNNTransform module in the beginning which normalizes the input image.

github.com

pytorch/vision/blob/master/torchvision/models/detection/transform.py#L46


def forward(self, images, targets=None):
    # type: (List[Tensor], Optional[List[Dict[str, Tensor]]])
    images = [img for img in images]
    for i in range(len(images)):
        image = images[i]
        target_index = targets[i] if targets is not None else None


        if image.dim() != 3:
            raise ValueError("images is expected to be a list of 3d tensors "
                             "of shape [C, H, W], got {}".format(image.shape))
        image = self.normalize(image)
        image, target_index = self.resize(image, target_index)
        images[i] = image
        if targets is not None and target_index is not None:
            targets[i] = target_index


    image_sizes = [img.shape[-2:] for img in images]
    images = self.batch_images(images)
    image_sizes_list = torch.jit.annotate(List[Tuple[int, int]], [])
    for image_size in image_sizes:
        assert len(image_size) == 2

Do you feed 3-channel(RGB) images as input or do you convert them into grayscale? FasterRCNN expects a 3-channel input image.

_Allen · December 2, 2019, 11:58am

I feed it 3 channel images