Hello,
I have a dataset of grayscale images of shape [512, 1536] and are already normalized in the range [0, 1]. I want to use the pre-trained FasterRCNN
to train on these images and then do predictions later.
I have my own normalization function as the input is not a standard RGB format. Therefore I don’t want to use the normalize
transform from GeneralizedRCNNTransform
. As the FasterRCNN backbone conv1 layer expects 3 channel input, I did some modifications like: I changed the in_channels
argument from 3 to 1 and then summed the weights over the first dimension thus having the shape: [64, 1, 7, 7]
.
The modifications are shown below:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_channels = 1
model.backbone.body.conv1.in_channels = in_channels
model.backbone.body.conv1.weight.data = model.backbone.body.conv1.weight.data.sum(dim=1, keepdim=True)
I wrote my own Dataset Class and a Dataloader object which returns training and validation dataloaders.
The image and targets are written in the dataset __getitem__
method as follows:
torch_image = torch.from_numpy(np_image).reshape(1, 512, 1536) # dtype: torch.float64
torch_boxes = torch.from_numpy(boxes)
torch_classes = torch.from_numpy(classes)
# Dictionary of boxes and classes
target = {'boxes': torch_boxes, 'labels': torch_classes}
An example of the contents of the train_loader
with batch_size=1
are:
image, target = next(iter(train_loader))
image
:
(tensor([[[0.4000, 0.4023, 0.4248, ..., 0.4196, 0.4145, 0.4077],
[0.4181, 0.4204, 0.4214, ..., 0.4211, 0.3902, 0.4131],
[0.4070, 0.4200, 0.4047, ..., 0.4118, 0.3981, 0.4129],
...,
[0.4368, 0.4229, 0.4270, ..., 0.4148, 0.4243, 0.4119],
[0.4266, 0.4373, 0.4256, ..., 0.4088, 0.4199, 0.4068],
[0.4313, 0.4400, 0.4073, ..., 0.4173, 0.4254, 0.4254]]],
dtype=torch.float64),)
target
:
({'boxes': tensor([[ 671, 144, 955, 385],
[1252, 138, 1535, 380],
[ 1, 177, 267, 512],
[ 211, 182, 261, 215],
[ 264, 184, 310, 208],
[ 487, 164, 544, 194],
[ 902, 129, 962, 208]], dtype=torch.int32),
'labels': tensor([4, 4, 4, 4, 4, 4, 6], dtype=torch.int32)},)
Here, class 4 indicates car and class 6 indicates Truck
.
But when I try to find the output of the model by using output = model(image, target)
, I get the following error:
`Traceback (most recent call last):
File "C:\Users\DELL\AppData\Roaming\Python\Python37\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-12-87f75128d770>", line 1, in <module>
output = model(img, target)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\detection\generalized_rcnn.py", line 94, in forward
features = self.backbone(images.tensors)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\detection\backbone_utils.py", line 44, in forward
x = self.body(x)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\_utils.py", line 63, in forward
x = module(x)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\conv.py", line 396, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 1, 7, 7], expected input[1, 3, 448, 1344] to have 1 channels, but got 3 channels instead`
Edit 1: I changed/added a few lines in the code which can be seen below:
grcnn = torchvision.models.detection.transform.GeneralizedRCNNTransform(min_size=500, max_size=1300, image_mean=[0], image_std=[1])
model.transform = grcnn
Because I already normalized my input images using a different algorithm, I decided to turn off the normalization by using mean=0 and standard_deviation=1 as suggested by @ptrblck here.
I also explicitly made my boxes and classes tensors as floats as shown below:
torch_image = torch.from_numpy(np_image).reshape(1, 512, 1536) # dtype: torch.float64
torch_boxes = torch.from_numpy(boxes).type(torch.FloatTensor)
torch_classes = torch.from_numpy(classes).type(torch.FloatTensor)
However when I try to check the model output using
image, target = next(iter(train_loader))
output = model(image, target)
I am getting another error:
Traceback (most recent call last):
File "C:\Users\DELL\AppData\Roaming\Python\Python37\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-96b845994ad7>", line 1, in <module>
model(img, target)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\detection\generalized_rcnn.py", line 94, in forward
features = self.backbone(images.tensors)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\detection\backbone_utils.py", line 44, in forward
x = self.body(x)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\anaconda\envs\detectron2\lib\site-packages\torchvision\models\_utils.py", line 63, in forward
x = module(x)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "F:\anaconda\envs\detectron2\lib\site-packages\torch\nn\modules\conv.py", line 396, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: expected scalar type Double but found Float
I don’t understand why I keep getting this error and where the problem lies. Would really appreciate your help.
Edit 2:
So as I kept going through the forum I found this from @ptrblck which did the trick and therefore, I got the following output:
{'loss_classifier': tensor(0.1866, grad_fn=<NllLossBackward>),
'loss_box_reg': tensor(0.0092, grad_fn=<DivBackward0>),
'loss_objectness': tensor(0.9861, grad_fn=<BinaryCrossEntropyWithLogitsBackward>),
'loss_rpn_box_reg': tensor(0.0660, grad_fn=<DivBackward0>)}
Can someone explain why the loss_objectness is so high?
Therefore, my questions are:
- What else do I need to change in the FasterRCNN model (layers) so that it works with the Grayscale images?
- Do I still have to use the normalize function of GeneralizedRCNNTransform? I have already normalized the inputs
- Are the images and targets properly prepared in the Dataset Class?: Yes
Thank You.
Edit 3:
- How to map the class labels for the FasterRCNN so that it uses the numbers 4 and 6 as mentioned above for Car and truck respectively?
In my dataset the Car Label is 4 and the truck label is 6 but it is not the case for the coco dataset. How should I map these numbers so that my model recognizes that 4 should be detected as Car and 6 should be detected as Truck.
Better yet, I don’t mind training from scratch, how do I train for only car and truck class?
Thank You.