Hi, I’m a complete beginner with object detection, so apologies if my issue is blaringly obvious!
I’m working with fasterrcnn_resnet50_fpn which has been modified to take in an arbitrary number of input channels using the following code:
self.faster_rcnn = create_faster_rcnn_model(
num_classes, image_mean=image_mean, image_std=image_std
)
if num_channels != 3:
# Adjusting initial layer to handle arbitrary number of inputchannels
self.faster_rcnn.backbone.body.conv1 = nn.Conv2d(
num_channels,
self.faster_rcnn.backbone.body.conv1.out_channels,
kernel_size=7,
stride=2,
padding=3,
bias=False,
)
If I use a 3 channel input, then I’m able to get predictions during inference no problem. However, if I use anything else I always get 0 predictions.
I’ve check my loss for a 2-channel input and it converges:
So I think that means my gradients are connected and training properly. Has anyone run into a similar issue? Please let me know if more information is needed!