Size mismatch error while testing RGBA images with resnet50

dimple · February 18, 2019, 8:19am

here i have changed resnet model with 4 channels as i took RGBA images from 3d pointclouds

class ResNet(nn.Module):

def __init__(self, block, layers, num_classes=3, zero_init_residual=False):
    super(ResNet, self).__init__()
    self.inplanes = 64
    self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    self.relu = nn.ReLU(inplace=True)
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
    self.layer1 = self._make_layer(block, 64, layers[0])
    self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
    self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
    self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
    self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
    self.fc = nn.Linear(512 * block.expansion, num_classes)

    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        elif isinstance(m, nn.BatchNorm2d):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

    # Zero-initialize the last BN in each residual branch,
    # so that the residual branch starts with zeros, and each residual block behaves like an identity.
    # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
    if zero_init_residual:
        for m in self.modules():
            if isinstance(m, Bottleneck):
                nn.init.constant_(m.bn3.weight, 0)
            elif isinstance(m, BasicBlock):
                nn.init.constant_(m.bn2.weight, 0)

def _make_layer(self, block, planes, blocks, stride=1):
    downsample = None
    if stride != 1 or self.inplanes != planes * block.expansion:
        downsample = nn.Sequential(
            conv1x1(self.inplanes, planes * block.expansion, stride),
            nn.BatchNorm2d(planes * block.expansion),
        )

    layers = []
    layers.append(block(self.inplanes, planes, stride, downsample))
    self.inplanes = planes * block.expansion
    for _ in range(1, blocks):
        layers.append(block(self.inplanes, planes))

    return nn.Sequential(*layers)

def forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)

    x = self.avgpool(x)
    x = x.view(x.size(0), -1)
    x = self.fc(x)

    return x

dimple · February 18, 2019, 8:20am

i want to use RGBA images from 3d pointclouds so that i can use some extra infomation like depth. so i have trained resnet 50 with pretrained= false and no of channels =3.

ptrblck · February 18, 2019, 12:42pm

The last code snippet you’ve posted contains code for training and testing a ResNet with 3 input channels.
Have you trained this model from scratch using some input data?
Would you want to add an additional channel now to this manually pretrained model and finetune it further?

dimple · February 18, 2019, 12:57pm

self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)

this shows i have trained with 4 channels … right?

ptrblck · February 18, 2019, 1:05pm

This post shows you’ve trained using 3 channels:

So did you use the posted script to train your model or did you use another one?

dimple · February 18, 2019, 1:06pm

ok , now i got it.
Thanks , i wud try changing this to 4 . Hope this would solve my problem.Hope these two are the only changes to be done for using 4 channels

model_ft = models.resnet50(pretrained=False)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 4)

and in resnet class

self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)

ptrblck · February 18, 2019, 1:07pm

Let me know, if I’m completely misunderstanding something here.
So basically, if you train your model using 4 channels, it should work in for training and testing likewise.
I’m currently not sure, where the error is so feel free to post some more information, as I have the strong feeling to misunderstand you.

dimple · February 18, 2019, 1:09pm

ya sure. will post you the updates. The thing is training will take 3 days for 24 epochs ,then only can do testing

ptrblck · February 18, 2019, 1:11pm

To just make sure your code is working you won’t have to train the model to perfection.
Just train it for some iterations and try your validation / testing code.
If your code works without any problems using the 4-channel input data, you are good to go!

dimple · February 18, 2019, 3:47pm

hey here
model_ft = models.resnet50(pretrained=False)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 3)

3 refers to the no of classes right?
and i think the problem is with the pytorch dataloader which needs custom class for RGBA. Am i right pytorch guy?

ptrblck · February 18, 2019, 4:39pm

No, the 3 refers to the number of output units.
ResNet is implemented in torchvision, so I would recommend to copy the code, paste it in your script, change the first layer, and use this implementation instead of models.resnet50.

dimple · February 19, 2019, 5:26am

Did u mean this? sorry for pasting the whole code here , but need to clarify it

import torch.nn as nn
import torch.utils.model_zoo as model_zoo


__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
           'resnet152']


model_urls = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
}


def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)


def conv1x1(in_planes, out_planes, stride=1):
    """1x1 convolution"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = conv1x1(inplanes, planes)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = conv3x3(planes, planes, stride)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = conv1x1(planes, planes * self.expansion)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=3, zero_init_residual=False):
        super(ResNet, self).__init__()
        self.inplanes = 64
        self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x


def resnet50(pretrained=False, **kwargs):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
    if pretrained:
        model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
    return model

so i have to call this as
model_ft = resnet50(pretrained=False)
instead of the following
model_ft = models.resnet50(pretrained=False)

ptrblck · February 19, 2019, 6:26am

Your code looks alright. Just to make sure it’s working, could you print the first conv layer after initializing your model and check the number of input channels?
print(model.conv1)

PS: you can add code snippets using three backticks ` I’ve added them to your post for better readibility.

dimple · February 19, 2019, 6:40am

Thanks for your response.
I have trained for a small sample input and getting the following:

Conv2d(4, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
Epoch 0/4

RuntimeError: Given groups=1, weight of size [64, 4, 7, 7], expected input[2, 3, 224, 224] to have 4 channels, but got 3 channels instead

Are u sure dataloader neednot be customized?
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=2,
shuffle=True, num_workers=2)
for x in [‘train’, ‘val’]}

ptrblck · February 19, 2019, 6:48am

Yes, your DataLoader should be fine.
However, since you are passing inputs with 3 channels, your Dataset might be removing the alpha channel, e.g. if you are using ImageFolder with the default loader. In this line of code your image will be transformed to RGB. You could define your custom loader to avoid this.

dimple · February 19, 2019, 6:50am

class_0_dir = '/home/raman/Classification/data_views/train/0/'
class_1_dir = '/home/raman/Classification/data_views/train/1/'
class_2_dir = '/home/raman/Classification/data_views/train/2/'

im_class_0 = os.listdir(class_0_dir)

im_class_1 = os.listdir(class_1_dir)
im_class_2 = os.listdir(class_2_dir)

# im_class_0 = np.array(im_class_0)
# im_class_1 = np.array(im_class_1)
# im_class_2 = np.array(im_class_2)

im_class_0_1 = [class_0_dir + t for t in im_class_0]
im_class_1_1 = [class_1_dir + t for t in im_class_1]
im_class_2_1 = [class_2_dir + t for t in im_class_2]

label_0 = np.ones(len(im_class_0_1))*0
label_1 = np.ones(len(im_class_1_1))*1
label_2 = np.ones(len(im_class_2_1))*2

train = im_class_0_1.copy()
train.extend(im_class_1_1)
train.extend(im_class_2_1)

label = np.concatenate((label_0,label_1,label_2))

class MyDataset(Dataset):
    def __init__(self, image_paths, targets, transform=None):
        self.image_paths = image_paths
        self.targets = targets
        self.transform = transform
        
    def __getitem__(self, index):
        x = Image.open(self.image_paths[index])
        y = self.targets[index]
        if self.transform:
            x = self.transform(x)
            
        return x, y
    
    def __len__(self):
        return len(self.image_paths)

dataset = MyDataset(train, label)
dataloaders = {x: torch.utils.data.DataLoader(dataset[x], batch_size=64,
                                            shuffle=True, num_workers=2)
                for x in ['train', 'label']}


data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406, 0.401], [0.229, 0.224, 0.225,0.222])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model_ft = resnet50(pretrained=False)

where can i use the data transformations like resize. Is it done inside class?

ptrblck · February 20, 2019, 11:05am

You can pass the data_transforms to your custom Dataset:

dataset = MyDataset(train, label, data_transforms['train'])

Your custom Dataset will apply these transformations in the __getitem__ method.

PS: I’ve formatted your code for more readability. You can add code snippets using three backticks ```

dimple · February 20, 2019, 11:59am

class MyDataset(Dataset):
    def __init__(self, image_paths, targets, transform=None):
        self.image_paths = image_paths
        self.targets = targets
        self.transform = transform
        # self.resizedCrop = transforms.RandomResizedCrop(224)
        # self.to_tensor = transforms.ToTensor()
        # self.normalize = transforms.Normalize([0.485, 0.456, 0.406, 0.401], [0.229, 0.224, 0.225,0.222])
        
    def __getitem__(self, index):
        x = Image.open(self.image_paths[index])
        y = self.targets[index]
        # data = self.center_crop(data)  # (2)
        # data = self.to_tensor(data)  # (2)
        
        if self.transform:
            x = self.transform(x)
            
        return x, y
    
    def __len__(self):
        return len(self.image_paths)


train_dataset = MyDataset(train, label, data_transforms['train'])
# val_dataset = MyDataset(val, label, data_transforms['val'])

dataloaders = {x: torch.utils.data.DataLoader(train_dataset[x], batch_size=64,
                                            shuffle=True, num_workers=2)
                for x in ['train', 'label']}

File “clas_resnet50_4c_1.py”, line 93, in
for x in [‘train’, ‘label’]}
File “clas_resnet50_4c_1.py”, line 93, in
for x in [‘train’, ‘label’]}
File “clas_resnet50_4c_1.py”, line 72, in getitem
x = Image.open(self.image_paths[index])
TypeError: list indices must be integers or slices, not str

ptrblck · February 20, 2019, 2:47pm

Currently you’re using only a train_dataset, so your dataloaders dict won’t work.
If you want to create a train and val Dataset and DataLoaders, have a look at the transfer learning tutorial.
To fix the error in your current script, just use:

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)

dimple · February 21, 2019, 1:51pm

How can i call both train and val in the same dataloader here?