Error when trying to train FasterRCNN with custom backbone on GRAYSCALE images

I am following instructions from https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#putting-everything-together tutorial in order to create object detector for 1 class on GRAYSCALE images.

Here is my code (note that I am using DenseNet as a BACKBONE - pretrained model by me on my own dataset):

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

num_classes = 2 # 1 class + background

model = torch.load(os.path.join(patch_classifier_model_dir, "densenet121.pt"))

backbone = model.features

backbone.out_channels = 1024

anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),

                                   aspect_ratios=((0.5, 1.0, 2.0),))

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],

                                                output_size=7,

                                                sampling_ratio=2)

# put the pieces together inside a FasterRCNN model

model = FasterRCNN(backbone,

                   num_classes=2,

                   rpn_anchor_generator=anchor_generator,

                   box_roi_pool=roi_pooler)

# move model to the right device

model.to(device)

optimizer = torch.optim.SGD(model.parameters(), lr=0.005,

                            momentum=0.9, weight_decay=0.0005)

# and a learning rate scheduler which decreases the learning rate by

# 10x every 3 epochs

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,

                                               step_size=3,

                                               gamma=0.1)

This is the error that I am running into:

RuntimeError: Given groups=1, weight of size [64, 1, 7, 7], expected input[2, 3, 1344, 800] to have 1 channels, but got 3 channels instead

Based on FasterRCNN architecture, I assume problem is in the transform component because it tries to normalize images that are initially grayscale, and not RGB:

FasterRCNN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): Sequential(
    (conv0): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
      
      ...............
        
    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(1024, 15, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(1024, 60, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=50176, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=2, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=8, bias=True)
    )
  )
)

Am I correct? If so, how do I resolve this issue? Is there a STANDARD PRACTICE on dealing with grayscale images and FasterRCNN?

Thanks in advance! Really appreciate it!

I’m facing the same issue, have you found a workaround for this?

Fasterrcnn works seamlessly with both grayscale and rgb. The error is coming somewhere in the forward function where the output from one component is not matching the input to the next component, you will have to debug a little using pytorch hooks to figure it out