Faster RCNN Object Detection shape error

Hello,

I have a dataset of grayscale images of shape [512, 1536] and are already normalized in the range [0, 1]. I want to use the pre-trained FasterRCNN to train on these images and then do predictions later.

I have my own normalization function as the input is not a standard RGB format. Therefore I don’t want to use the normalize transform from GeneralizedRCNNTransform. As the FasterRCNN backbone conv1 layer expects 3 channel input, I did some modifications like: I changed the in_channels argument from 3 to 1 and then summed the weights over the first dimension thus having the shape: [64, 1, 7, 7].

The modifications are shown below:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_channels = 1
model.backbone.body.conv1.in_channels = in_channels
model.backbone.body.conv1.weight.data = model.backbone.body.conv1.weight.data.sum(dim=1, keepdim=True)

I wrote my own Dataset Class and a Dataloader object which returns training and validation dataloaders.

The image and targets are written in the dataset __getitem__ method as follows:

torch_image = torch.from_numpy(np_image).reshape(1, 512, 1536)  # dtype: torch.float64
torch_boxes = torch.from_numpy(boxes)
torch_classes = torch.from_numpy(classes)

# Dictionary of boxes and classes
target = {'boxes': torch_boxes, 'labels': torch_classes}

An example of the contents of the train_loader with batch_size=1 are:
image, target = next(iter(train_loader))

image:

(tensor([[[0.4000, 0.4023, 0.4248,  ..., 0.4196, 0.4145, 0.4077],
          [0.4181, 0.4204, 0.4214,  ..., 0.4211, 0.3902, 0.4131],
          [0.4070, 0.4200, 0.4047,  ..., 0.4118, 0.3981, 0.4129],
          ...,
          [0.4368, 0.4229, 0.4270,  ..., 0.4148, 0.4243, 0.4119],
          [0.4266, 0.4373, 0.4256,  ..., 0.4088, 0.4199, 0.4068],
          [0.4313, 0.4400, 0.4073,  ..., 0.4173, 0.4254, 0.4254]]],
        dtype=torch.float64),)

target:

({'boxes': tensor([[ 671,  144,  955,  385],
          [1252,  138, 1535,  380],
          [   1,  177,  267,  512],
          [ 211,  182,  261,  215],
          [ 264,  184,  310,  208],
          [ 487,  164,  544,  194],
          [ 902,  129,  962,  208]], dtype=torch.int32),
  'labels': tensor([4, 4, 4, 4, 4, 4, 6], dtype=torch.int32)},)

Here, class 4 indicates car and class 6 indicates Truck.

But when I try to find the output of the model by using output = model(image, target), I get the following error:

`Traceback (most recent call last):
  File "C:\Users\DELL\AppData\Roaming\Python\Python37\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-12-87f75128d770>", line 1, in <module>
    output = model(img, target)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\detection\generalized_rcnn.py", line 94, in forward
    features = self.backbone(images.tensors)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\detection\backbone_utils.py", line 44, in forward
    x = self.body(x)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\_utils.py", line 63, in forward
    x = module(x)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\conv.py", line 396, in _conv_forward
    self.padding, self.dilation, self.groups)

RuntimeError: Given groups=1, weight of size [64, 1, 7, 7], expected input[1, 3, 448, 1344] to have 1 channels, but got 3 channels instead`

Edit 1: I changed/added a few lines in the code which can be seen below:

grcnn = torchvision.models.detection.transform.GeneralizedRCNNTransform(min_size=500, max_size=1300, image_mean=[0], image_std=[1])
model.transform = grcnn

Because I already normalized my input images using a different algorithm, I decided to turn off the normalization by using mean=0 and standard_deviation=1 as suggested by @ptrblck here.

I also explicitly made my boxes and classes tensors as floats as shown below:

torch_image = torch.from_numpy(np_image).reshape(1, 512, 1536)  # dtype: torch.float64
torch_boxes = torch.from_numpy(boxes).type(torch.FloatTensor)
torch_classes = torch.from_numpy(classes).type(torch.FloatTensor)

However when I try to check the model output using

image, target = next(iter(train_loader))
output = model(image, target)

I am getting another error:

Traceback (most recent call last):
  File "C:\Users\DELL\AppData\Roaming\Python\Python37\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-96b845994ad7>", line 1, in <module>
    model(img, target)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\detection\generalized_rcnn.py", line 94, in forward
    features = self.backbone(images.tensors)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\detection\backbone_utils.py", line 44, in forward
    x = self.body(x)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\_utils.py", line 63, in forward
    x = module(x)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\conv.py", line 396, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: expected scalar type Double but found Float

I don’t understand why I keep getting this error and where the problem lies. Would really appreciate your help.

Edit 2:
So as I kept going through the forum I found this from @ptrblck which did the trick and therefore, I got the following output:

{'loss_classifier': tensor(0.1866, grad_fn=<NllLossBackward>),
 'loss_box_reg': tensor(0.0092, grad_fn=<DivBackward0>),
 'loss_objectness': tensor(0.9861, grad_fn=<BinaryCrossEntropyWithLogitsBackward>),
 'loss_rpn_box_reg': tensor(0.0660, grad_fn=<DivBackward0>)}

Can someone explain why the loss_objectness is so high?

Therefore, my questions are:

  1. What else do I need to change in the FasterRCNN model (layers) so that it works with the Grayscale images?
  2. Do I still have to use the normalize function of GeneralizedRCNNTransform? I have already normalized the inputs
  3. Are the images and targets properly prepared in the Dataset Class?: Yes

Thank You.

Edit 3:

  1. How to map the class labels for the FasterRCNN so that it uses the numbers 4 and 6 as mentioned above for Car and truck respectively?
    In my dataset the Car Label is 4 and the truck label is 6 but it is not the case for the coco dataset. How should I map these numbers so that my model recognizes that 4 should be detected as Car and 6 should be detected as Truck.
    Better yet, I don’t mind training from scratch, how do I train for only car and truck class?

Thank You.