Train torchvision faster-rcnn on 4-channels images

Hello!
I train Faster-RCNN from torchvision on 4-channels image. I changed resnet50 backbone for 4-channels input but I get error in “torchvision/models/detection/transform.py”.

return (image - mean[:, None, None]) / std[:, None, None]
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

I can somehow turn off these transforms because right now I can’t train the model with them?
Thanks for answers!

Did you fix this issue? If so, could you share how to skip this error? Thanks

I got the same error. It’s because PyTorch’s FasterRCNN has a GeneralizedRCNNTransform under the hood. I was able to fix the error by running the following:

from torchvision.models.detection.transform import GeneralizedRCNNTransform
from torchvision.models.detection import FasterRCNN

n_channels=4

class ModTransform(GeneralizedRCNNTransform):
    def __init__(self):
        #adjust image means and image stds as needed
        #both should be the length of channels
        super().__init__(min_size=800, max_size=1300, 
                         image_mean=list(np.zeros(n_channels, int)), 
                         image_std=list(np.ones(n_channels, int)))

    def normalize(self, image):
        # Add your normalization needs
        return image

    def resize(self, image, target):
        # Add your resize needs
        return image, target

#Define FasterRCNN model according to your specs
model = FasterRCNN(...)

# Change the model transform to ModTransform
transform = ModTransform()
model.transform = transform