Size mismatch error while testing RGBA images with resnet50

here i have changed resnet model with 4 channels as i took RGBA images from 3d pointclouds

class ResNet(nn.Module):

def __init__(self, block, layers, num_classes=3, zero_init_residual=False):
    super(ResNet, self).__init__()
    self.inplanes = 64
    self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    self.relu = nn.ReLU(inplace=True)
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
    self.layer1 = self._make_layer(block, 64, layers[0])
    self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
    self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
    self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
    self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
    self.fc = nn.Linear(512 * block.expansion, num_classes)

    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        elif isinstance(m, nn.BatchNorm2d):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

    # Zero-initialize the last BN in each residual branch,
    # so that the residual branch starts with zeros, and each residual block behaves like an identity.
    # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
    if zero_init_residual:
        for m in self.modules():
            if isinstance(m, Bottleneck):
                nn.init.constant_(m.bn3.weight, 0)
            elif isinstance(m, BasicBlock):
                nn.init.constant_(m.bn2.weight, 0)

def _make_layer(self, block, planes, blocks, stride=1):
    downsample = None
    if stride != 1 or self.inplanes != planes * block.expansion:
        downsample = nn.Sequential(
            conv1x1(self.inplanes, planes * block.expansion, stride),
            nn.BatchNorm2d(planes * block.expansion),
        )

    layers = []
    layers.append(block(self.inplanes, planes, stride, downsample))
    self.inplanes = planes * block.expansion
    for _ in range(1, blocks):
        layers.append(block(self.inplanes, planes))

    return nn.Sequential(*layers)

def forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)

    x = self.avgpool(x)
    x = x.view(x.size(0), -1)
    x = self.fc(x)

    return x

i want to use RGBA images from 3d pointclouds so that i can use some extra infomation like depth. so i have trained resnet 50 with pretrained= false and no of channels =3.

The last code snippet you’ve posted contains code for training and testing a ResNet with 3 input channels.
Have you trained this model from scratch using some input data?
Would you want to add an additional channel now to this manually pretrained model and finetune it further?

self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)

this shows i have trained with 4 channels … right?

This post shows you’ve trained using 3 channels:

So did you use the posted script to train your model or did you use another one?

ok , now i got it.
Thanks , i wud try changing this to 4 . Hope this would solve my problem.Hope these two are the only changes to be done for using 4 channels

model_ft = models.resnet50(pretrained=False)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 4)

and in resnet class

self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)

Let me know, if I’m completely misunderstanding something here.
So basically, if you train your model using 4 channels, it should work in for training and testing likewise.
I’m currently not sure, where the error is so feel free to post some more information, as I have the strong feeling to misunderstand you. :wink:

ya sure. will post you the updates. The thing is training will take 3 days for 24 epochs ,then only can do testing

To just make sure your code is working you won’t have to train the model to perfection.
Just train it for some iterations and try your validation / testing code.
If your code works without any problems using the 4-channel input data, you are good to go!

hey here
model_ft = models.resnet50(pretrained=False)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 3)

3 refers to the no of classes right?
and i think the problem is with the pytorch dataloader which needs custom class for RGBA. Am i right pytorch guy?

No, the 3 refers to the number of output units.
ResNet is implemented in torchvision, so I would recommend to copy the code, paste it in your script, change the first layer, and use this implementation instead of models.resnet50.

Did u mean this? sorry for pasting the whole code here , but need to clarify it

import torch.nn as nn
import torch.utils.model_zoo as model_zoo


__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
           'resnet152']


model_urls = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
}


def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)


def conv1x1(in_planes, out_planes, stride=1):
    """1x1 convolution"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = conv1x1(inplanes, planes)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = conv3x3(planes, planes, stride)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = conv1x1(planes, planes * self.expansion)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=3, zero_init_residual=False):
        super(ResNet, self).__init__()
        self.inplanes = 64
        self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x


def resnet50(pretrained=False, **kwargs):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
    return model

so i have to call this as
model_ft = resnet50(pretrained=False)
instead of the following
model_ft = models.resnet50(pretrained=False)

Your code looks alright. Just to make sure it’s working, could you print the first conv layer after initializing your model and check the number of input channels?
print(model.conv1)

PS: you can add code snippets using three backticks ` :wink: I’ve added them to your post for better readibility.

Thanks for your response.
I have trained for a small sample input and getting the following:

Conv2d(4, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
Epoch 0/4

RuntimeError: Given groups=1, weight of size [64, 4, 7, 7], expected input[2, 3, 224, 224] to have 4 channels, but got 3 channels instead

Are u sure dataloader neednot be customized?
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=2,
shuffle=True, num_workers=2)
for x in [‘train’, ‘val’]}

Yes, your DataLoader should be fine.
However, since you are passing inputs with 3 channels, your Dataset might be removing the alpha channel, e.g. if you are using ImageFolder with the default loader. In this line of code your image will be transformed to RGB. You could define your custom loader to avoid this.

class_0_dir = '/home/raman/Classification/data_views/train/0/'
class_1_dir = '/home/raman/Classification/data_views/train/1/'
class_2_dir = '/home/raman/Classification/data_views/train/2/'

im_class_0 = os.listdir(class_0_dir)

im_class_1 = os.listdir(class_1_dir)
im_class_2 = os.listdir(class_2_dir)

# im_class_0 = np.array(im_class_0)
# im_class_1 = np.array(im_class_1)
# im_class_2 = np.array(im_class_2)

im_class_0_1 = [class_0_dir + t for t in im_class_0]
im_class_1_1 = [class_1_dir + t for t in im_class_1]
im_class_2_1 = [class_2_dir + t for t in im_class_2]

label_0 = np.ones(len(im_class_0_1))*0
label_1 = np.ones(len(im_class_1_1))*1
label_2 = np.ones(len(im_class_2_1))*2

train = im_class_0_1.copy()
train.extend(im_class_1_1)
train.extend(im_class_2_1)

label = np.concatenate((label_0,label_1,label_2))

class MyDataset(Dataset):
    def __init__(self, image_paths, targets, transform=None):
        self.image_paths = image_paths
        self.targets = targets
        self.transform = transform
        
    def __getitem__(self, index):
        x = Image.open(self.image_paths[index])
        y = self.targets[index]
        if self.transform:
            x = self.transform(x)
            
        return x, y
    
    def __len__(self):
        return len(self.image_paths)

dataset = MyDataset(train, label)
dataloaders = {x: torch.utils.data.DataLoader(dataset[x], batch_size=64,
                                            shuffle=True, num_workers=2)
                for x in ['train', 'label']}


data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406, 0.401], [0.229, 0.224, 0.225,0.222])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model_ft = resnet50(pretrained=False)

where can i use the data transformations like resize. Is it done inside class?

You can pass the data_transforms to your custom Dataset:

dataset = MyDataset(train, label, data_transforms['train'])

Your custom Dataset will apply these transformations in the __getitem__ method.

PS: I’ve formatted your code for more readability. You can add code snippets using three backticks ``` :wink:

class MyDataset(Dataset):
    def __init__(self, image_paths, targets, transform=None):
        self.image_paths = image_paths
        self.targets = targets
        self.transform = transform
        # self.resizedCrop = transforms.RandomResizedCrop(224)
        # self.to_tensor = transforms.ToTensor()
        # self.normalize = transforms.Normalize([0.485, 0.456, 0.406, 0.401], [0.229, 0.224, 0.225,0.222])
        
    def __getitem__(self, index):
        x = Image.open(self.image_paths[index])
        y = self.targets[index]
        # data = self.center_crop(data)  # (2)
        # data = self.to_tensor(data)  # (2)
        
        if self.transform:
            x = self.transform(x)
            
        return x, y
    
    def __len__(self):
        return len(self.image_paths)


train_dataset = MyDataset(train, label, data_transforms['train'])
# val_dataset = MyDataset(val, label, data_transforms['val'])

dataloaders = {x: torch.utils.data.DataLoader(train_dataset[x], batch_size=64,
                                            shuffle=True, num_workers=2)
                for x in ['train', 'label']}

File “clas_resnet50_4c_1.py”, line 93, in
for x in [‘train’, ‘label’]}
File “clas_resnet50_4c_1.py”, line 93, in
for x in [‘train’, ‘label’]}
File “clas_resnet50_4c_1.py”, line 72, in getitem
x = Image.open(self.image_paths[index])
TypeError: list indices must be integers or slices, not str

Currently you’re using only a train_dataset, so your dataloaders dict won’t work.
If you want to create a train and val Dataset and DataLoaders, have a look at the transfer learning tutorial.
To fix the error in your current script, just use:

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)

How can i call both train and val in the same dataloader here?