ERROR: Change input chanel with inceptionv3

Hi everybody.
My English not good, im from VietNam
I tried change input chanels to inception recive input 4 channel (mycode below) with one picture .png and depth information .npy, I get error
My code:

def Inception(in_planes, out_planes, pretrained=False):
    if pretrained is True:       
        model = models.inception_v3(pretrained=True)
        print("Pretrained model is loaded")
    else:        
        model = models.inception_v3(pretrained=False)
    if in_planes == 4:
        model.Conv2d_1a_3x3.conv=nn.Conv2d(4, 64, kernel_size=3, stride=2, padding=2, bias=False)
        nn.init.kaiming_normal_(model.Conv2d_1a_3x3.conv.weight, mode='fan_out', nonlinearity='relu')
    # Parameters of newly constructed modules have requires_grad=True by default
    model.fc = nn.Linear(model.fc.in_features, out_planes)
    return model

My error:

Traceback (most recent call last):
  File "train_softmax.py", line 307, in <module>
    pretrained_optim_path=args.pretrained_optim_path)
  File "train_softmax.py", line 155, in train_model
    outputs,aux_outputs = model(inputs)
  File "/home/viet/anaconda3/envs/pythonProject11/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/viet/anaconda3/envs/pythonProject11/lib/python3.7/site-packages/torchvision/models/inception.py", line 101, in forward
    x = self.Conv2d_1a_3x3(x)
  File "/home/viet/anaconda3/envs/pythonProject11/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/viet/anaconda3/envs/pythonProject11/lib/python3.7/site-packages/torchvision/models/inception.py", line 352, in forward
    x = self.conv(x)
  File "/home/viet/anaconda3/envs/pythonProject11/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/viet/anaconda3/envs/pythonProject11/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "/home/viet/anaconda3/envs/pythonProject11/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 64 4 3 3, expected input[16, 3, 299, 299] to have 4 channels, but got 3 channels instead
  0%|                                                                                                                                

My input is one picture .PNG and one .NPY file. I concate and tranform to tensor.

Please help me.

Based on the error message it seems that the manipulation of the model is indeed working, while the input concatenation is not:

RuntimeError: Given groups=1, weight of size 64 4 3 3, expected input[16, 3, 299, 299] to have 4 channels, but got 3 channels instead

This error indicates that the input is still using 3 channels, while 4 are expected.
Could you double check and/or post the code which creates the new input tensor?

Before: I concate image RGB numpy with depthmap

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
        rgb_image_path, dep_image_path, cls_id = self.df.iloc[idx]
        # print(rgb_image_path)
        image = Image.open(rgb_image_path)
        if self.input_channels == 4:
            rgb_image = np.asarray(image)
            misc.imsave("trung.jpg", rgb_image)
            dep_image = np.load(dep_image_path)
            depth_colormap = cv2.applyColorMap(cv2.convertScaleAbs(dep_image, alpha=340), cv2.COLORMAP_JET)
            misc.imsave("trung1.jpg", depth_colormap)
            dep_image = np.expand_dims(dep_image, axis=-1)
            image = np.concatenate((rgb_image, dep_image), axis=-1)
            #print("chua tranform",image)#kieu numpu
            
        if self._transform is not None:
            image = self._transform(image)

            #print("da tranform", image)#kieu tensor
        # There is no need to transfer into One-hot encoding
        # label = torch.zeros(self._num_of_classes, dtype=torch.long).scatter_(0, torch.from_numpy(np.array(cls_id)), 1)
        # label = (np.arange(self._num_of_classes) == cls_id).astype(np.float32)
        #print(image)
        return image, cls_id

After: I using new define function to normalize


class Resize(object):
    """Resize the image in a sample to a given size.
    ** Strongly Recommended use this class only when dealing with RGB-D Images **

    Args:
        output_size (tuple or int): Desired output size. If tuple, output is
            matched to output_size. If int, smaller of image edges is matched
            to output_size keeping aspect ratio the same.
    """

    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

    def __call__(self, image):

        #print("co resize:===========================>",image)
        #print("shape:=====================",image.shape)#(256,256,4)
        if isinstance(image, np.ndarray):
            h, w = image.shape[:2]
        else:
            return transforms.Resize(self.output_size)(image)
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size
        new_h, new_w = int(new_h), int(new_w)
        image = transform.resize(image, (new_h, new_w)).astype(np.float32)

        #print("co resize111111:===========================>", image)
        print("shape co resize:=====================",image.shape)
        return image


class RandomHorizontalFlip(object):
    """Horizontally flip the given Image randomly with a given probability.

    Args:
        p (float): probability of the image being flipped. Default value is 0.5
    """

    def __init__(self, p=0.5):
        self.p = p

    def __call__(self, image):
        """
        Args:
            image ( Image): Image to be flipped.

        Returns:
            Image: Randomly flipped image.
        """
        if not isinstance(image, np.ndarray):
            return transforms.RandomHorizontalFlip(p=self.p)(image)
        if random.random() < self.p:
            return np.fliplr(image).astype(np.float32)
        return image

    def __repr__(self):
        return self.__class__.__name__ + '(p={})'.format(self.p)

I think inceptionv3 cannot receive 4 chanel input here, but how about.


Please help me.

I don’t think the issue is in the model, but still the input.
In your current code you are checking for 4 channels and are only concatenating the image tensors then:

if self.input_channels == 4:
    ...
    image = np.concatenate((rgb_image, dep_image), axis=-1)

which seems to be wrong. Maybe you want to check for 3 channels to concatenate the images to 4 channels?

My code and result

def normalize_depth_image(image):
  min_depth = np.min(image)
  return (image.astype(float) - min_depth) / (np.max(image) - min_depth)
#RGB image
original = Image.open('31.png')
RGB_image=np.array(original)
#Depthmap of face
my_data = np.load('31.npy')
NN=normalize_depth_image(my_data)
output_data_N = (NN * 255).astype(np.uint8)
dep_image = np.expand_dims(output_data_N, axis=-1)

print('My RGB data:',RGB_image)
print('My depthmap data of face :',dep_image)


I concate got result:

My picture to train:
Screenshot from 2021-07-06 10-54-43

How can i finetune inception to train with 4 input channels instead 3 channel.
Thank for your help.

I’m unsure, if you are still stuck at the previous problem or if you are facing another issue.
Could you describe your current problem a bit more, please?

My problem:
My Project about recognize identify. I using camera Intel D435 to collect and get depthmap of face, I folllow this paper with my dataset student. link
After I train Model include one pictures(.png) and one depthmap(.npy) of face, I train my model ResNet50 with pretrain Imagenet, using Pytorch, my dataset have 500 class with 70 picture/class. I got model 99.7% But i have problem my model only veryfi data trained in my dataset, and not verify real data.
So i want change another CNN, I want use Inception V3 and I changed input channel, from 3 channel to 4 channel( RGB image and depthmap) but i got error
when i train.

RuntimeError: Given groups=1, weight of size 64 4 3 3, expected input[16, 3, 299, 299] to have 4 channels, but got 3 channels instead

All code I post above from first question.
Thank for your help.

Thanks for the update. As already pointed out, the error is caused by your input, so you would still have to check the aforementioned line of code, which seems to use a wrong condition.
Your code runs fine with the proper input (and an unrelated fix to the output channels of the new conv layer):

def Inception(in_planes, out_planes, pretrained=False):
    if pretrained is True:       
        model = models.inception_v3(pretrained=True)
        print("Pretrained model is loaded")
    else:        
        model = models.inception_v3(pretrained=False)
    if in_planes == 4:
        model.Conv2d_1a_3x3.conv=nn.Conv2d(4, 32, kernel_size=3, stride=2, padding=2, bias=False)
        nn.init.kaiming_normal_(model.Conv2d_1a_3x3.conv.weight, mode='fan_out', nonlinearity='relu')
    # Parameters of newly constructed modules have requires_grad=True by default
    model.fc = nn.Linear(model.fc.in_features, out_planes)
    return model


model = Inception(in_planes=4, out_planes=1, pretrained=False)
x = torch.randn(2, 4, 299, 299)
out = model(x)
print(out.logits.shape)
> torch.Size([2, 1])

there is transform_input which is transforming the input into 3 channels again so you need to add transform_input= False

model = torchvision.models.inception_v3(pretrained=True, transform_input= False)