Size mismatch error while testing RGBA images with resnet50

i have trained resnet 5o with RGBA images with
class ResNet(nn.Module):

def __init__(self, block, layers, num_classes=3, zero_init_residual=False):
    super(ResNet, self).__init__()
    self.inplanes = 64
    self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    self.relu = nn.ReLU(inplace=True)
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
    self.layer1 = self._make_layer(block, 64, layers[0])
    self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
    self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
    self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
    self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
    self.fc = nn.Linear(512 * block.expansion, num_classes)

    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        elif isinstance(m, nn.BatchNorm2d):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

    # Zero-initialize the last BN in each residual branch,
    # so that the residual branch starts with zeros, and each residual block behaves like an identity.
    # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
    if zero_init_residual:
        for m in self.modules():
            if isinstance(m, Bottleneck):
                nn.init.constant_(m.bn3.weight, 0)
            elif isinstance(m, BasicBlock):
                nn.init.constant_(m.bn2.weight, 0)

Now am getting the following error while testing the model:

as in load_state_dict
self.class.name, “\n\t”.join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNet:

size mismatch for conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 4, 7, 7]).

Kindly help me to solve this

As the error message explains your first conv layer uses a different number if in_channels than the one stored in the pretrained state_dict (4 vs. 3).

To fix this issue you could

  • remove the alpha channel from your images in case it’s all 255 anyway
  • load the original model with 3 input channels and replace the first conv layer
  • do as described before, but fill the first 3 channels with the pretrained weights and leave the last channel randomly initialized (not sure if that’ll work out)

Here am trained with pretrained = False and no of channels made to 4. I want to use depth information alpha of RGBA images. So how can i use the model for testing without error

If you don’t want to load the pretrained weights, there should be no problem in changing the input channels of the first conv layer. Do you get any error message using pretrained=False?

i didnt get error using pretrained = false and when i changed no of channels to 4. It took more time for training than pretrained = True but i am confused about the validation accuracies, got almost same 96.3% in both cases pretrained =true and pretrained = false with 4 channels. So does that mean 4th channel not contributing much to the accuracy?

How did you use pretrained=True with 4 input channels?
Did you copy the pretrained 3 channels to your newly initialized first conv layer with 4 input channels?
Training a model from scratch is expected to take more time then using a pretrained model for most use cases.

[quote=“dimple, post:1, topic:37417”]

just changed the no of channels 3 as 4 in the init class nn.Conv2d as follows:

def __init__(self, block, layers, num_classes=3, zero_init_residual=False):
    super(ResNet, self).__init__()
    self.inplanes = 64
    self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    self.relu = nn.ReLU(inplace=True)

If you change the model definition, the pretrained state_dict cannot be loaded and you’ll get the error message:

RuntimeError: Error(s) in loading state_dict for ResNet:
	size mismatch for conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 4, 7, 7]).

I assume either your resnet definition wasn’t updated or you didn’t use the pretrained weights.

ya … got this error
RuntimeError: Error(s) in loading state_dict for ResNet:
size mismatch for conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 4, 7, 7]).

How can i resolve it?

here is my code which loads and test the saved model

data_dir = ‘Classification’
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
data_transforms[x])
for x in [‘test’]}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=100,
shuffle=True, num_workers=24)
for x in [‘test’]}
print(dataloaders)
dataset_sizes = {x: len(image_datasets[x]) for x in [‘test’]}
class_names = image_datasets[‘test’].classes

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)

best_acc = 0.0

Each epoch has a training and validation phase

for phase in [‘test’]:
print(“model”)

model = resnet50().to(device)
model.load_state_dict(torch.load('resnet50_4c.pt'))
model.eval()   # Set model to evaluate mode

running_loss = 0.0
running_corrects = 0

# Iterate over data.
for inputs, labels in dataloaders[phase]:
    inputs = inputs.to(device)
    labels = labels.to(device)


    with torch.set_grad_enabled(phase == 'test'):
        outputs = model(inputs)
        _, preds = torch.max(outputs, 1)
        loss = criterion(outputs, labels)

        running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == labels.data)

        loss = running_loss / dataset_sizes[phase]
        acc = running_corrects.double() / dataset_sizes[phase]

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, loss, acc))



    time_elapsed = time.time() - since
    print('Testing complete in {:.0f}m {:.0f}s'.format(
    time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(acc))

I think the first suggestion won’t work, as your alpha channel contains valid information, so you would have to apply one of the following:

class BasicBlock(nn.Module):
expansion = 1

def __init__(self, inplanes, planes, stride=1, downsample=None):
    super(BasicBlock, self).__init__()
    self.conv1 = conv3x3(inplanes, planes, stride)
    self.bn1 = nn.BatchNorm2d(planes)
    self.relu = nn.ReLU(inplace=True)
    self.conv2 = conv3x3(planes, planes)
    self.bn2 = nn.BatchNorm2d(planes)
    self.downsample = downsample
    self.stride = stride

def forward(self, x):
    identity = x

    out = self.conv1(x)
    out = self.bn1(out)
    out = self.relu(out)

    out = self.conv2(out)
    out = self.bn2(out)

    if self.downsample is not None:
        identity = self.downsample(x)

    out += identity
    out = self.relu(out)

    return out

class Bottleneck(nn.Module):
expansion = 4

def __init__(self, inplanes, planes, stride=1, downsample=None):
    super(Bottleneck, self).__init__()
    self.conv1 = conv1x1(inplanes, planes)
    self.bn1 = nn.BatchNorm2d(planes)
    self.conv2 = conv3x3(planes, planes, stride)
    self.bn2 = nn.BatchNorm2d(planes)
    self.conv3 = conv1x1(planes, planes * self.expansion)
    self.bn3 = nn.BatchNorm2d(planes * self.expansion)
    self.relu = nn.ReLU(inplace=True)
    self.downsample = downsample
    self.stride = stride

def forward(self, x):
    identity = x

    out = self.conv1(x)
    out = self.bn1(out)
    out = self.relu(out)

    out = self.conv2(out)
    out = self.bn2(out)
    out = self.relu(out)

    out = self.conv3(out)
    out = self.bn3(out)

    if self.downsample is not None:
        identity = self.downsample(x)

    out += identity
    out = self.relu(out)

    return out

class ResNet(nn.Module):

def __init__(self, block, layers, num_classes=3, zero_init_residual=False):
    super(ResNet, self).__init__()
    self.inplanes = 64
    self.conv1 = nn.Conv2d(4, 64, kernel_size=7, stride=2, padding=3,
                           bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    self.relu = nn.ReLU(inplace=True)
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
    self.layer1 = self._make_layer(block, 64, layers[0])
    self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
    self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
    self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
    self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
    self.fc = nn.Linear(512 * block.expansion, num_classes)

    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
        elif isinstance(m, nn.BatchNorm2d):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

    # Zero-initialize the last BN in each residual branch,
    # so that the residual branch starts with zeros, and each residual block behaves like an identity.
    # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
    if zero_init_residual:
        for m in self.modules():
            if isinstance(m, Bottleneck):
                nn.init.constant_(m.bn3.weight, 0)
            elif isinstance(m, BasicBlock):
                nn.init.constant_(m.bn2.weight, 0)

def _make_layer(self, block, planes, blocks, stride=1):
    downsample = None
    if stride != 1 or self.inplanes != planes * block.expansion:
        downsample = nn.Sequential(
            conv1x1(self.inplanes, planes * block.expansion, stride),
            nn.BatchNorm2d(planes * block.expansion),
        )

    layers = []
    layers.append(block(self.inplanes, planes, stride, downsample))
    self.inplanes = planes * block.expansion
    for _ in range(1, blocks):
        layers.append(block(self.inplanes, planes))

    return nn.Sequential(*layers)

def forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)

    x = self.avgpool(x)
    x = x.view(x.size(0), -1)
    x = self.fc(x)

    return x

def resnet50(pretrained=False, **kwargs):
“”“Constructs a ResNet-50 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
“””
model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls[‘resnet50’]))
return model

What changes should i try here?

These would be the first approaches.
I tried to add some comments which should explain the necessary steps.

# Training from scratch
model = models.resnet50(pretrained=False)
model.conv1 = nn.Conv2d(4, 64, 7, 2, 3, bias=False)

# Using pretrained weights and add a randomly initialized channel
model = models.resnet50(pretrained=True)
with torch.no_grad():
    # Just store the weight parameters, as conv1 does not use bias
    pretrained_conv1 = model.conv1.weight.clone()
    # Assign new conv layer with 4 input channels
    model.conv1 = nn.Conv2d(4, 64, 7, 2, 3, bias=False)
    # Use same initialization as vanilla ResNet (Don't know if good idea)
    nn.init.kaiming_normal_(
        model.conv1.weight, mode='fan_out', nonlinearity='relu')
    # Re-assign pretraiend weights to first 3 channels
    # (assuming alpha channel is last in your input data)
    model.conv1.weight[:, :3] = pretrained_conv1
1 Like

Peter, i have some difficulty understanding this. Is this for testing only or shud i make the changes both in training and testing part?

The code snippet creates your model, which you should use for training as well as testing. What do you mean by “make the changes” ? Are you using different models for training and testing?

No, same model for training and testing. Here i need to train again from the scratch ,thats what you mean, training with my data takes 3 days for 24 epochs. so i was looking for changes only in testing code . Is it possible?
The first 2 lines in ur code i have already used in my training code . so hope training code needs no changes

The second code part uses the pretrained model, which should be faster.

Anyway, could you explain a little what you have done so far? As far as I understand you have already trained a model and would like to test it. Does this model have 4 input channels? What is the error in testing?

#Training code
data_transforms = {
‘train’: transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
‘val’: transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}

data_dir = ‘data_views’
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
data_transforms[x])
for x in [‘train’, ‘val’]}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=100,
shuffle=True, num_workers=24)
for x in [‘train’, ‘val’]}
dataset_sizes = {x: len(image_datasets[x]) for x in [‘train’, ‘val’]}
class_names = image_datasets[‘train’].classes

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)

model_ft = models.resnet50(pretrained=False)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 3)

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

optimizer_ft = optim.Adam(model_ft.parameters(), lr=0.001)

exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

def train_model(model, criterion, optimizer, scheduler, num_epochs):
since = time.time()

best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0

for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch, num_epochs - 1))
    print('-' * 10)

    for phase in ['train', 'val']:
        if phase == 'train':
            scheduler.step()
            model.train()  
        else:
            model.eval()   
        running_loss = 0.0
        running_corrects = 0

        for inputs, labels in dataloaders[phase]:
            inputs = inputs.to(device)
            labels = labels.to(device)

            optimizer.zero_grad()

         with torch.set_grad_enabled(phase == 'train'):
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

               if phase == 'train':
                    loss.backward()
                    optimizer.step()

            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)

        epoch_loss = running_loss / dataset_sizes[phase]
        epoch_acc = running_corrects.double() / dataset_sizes[phase]

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(
            phase, epoch_loss, epoch_acc))

       if phase == 'val' and epoch_acc > best_acc:
            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())
            torch.save(model.state_dict(), 'resnet504c.pt')


    print()

time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
    time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

model.load_state_dict(best_model_wts)
return model

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25)

#Testing Code

data_transforms = {
‘test’: transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}

data_dir = ‘data_views_test’
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
data_transforms[x])
for x in [‘test’]}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=100,
shuffle=True, num_workers=24)
for x in [‘test’]}
print(dataloaders)
dataset_sizes = {x: len(image_datasets[x]) for x in [‘test’]}
class_names = image_datasets[‘test’].classes

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)

best_acc = 0.0

for phase in [‘test’]:
print(“model”)

model = models.resnet50(pretrained=False)
model.load_state_dict(torch.load('resnet50_4c.pt'))

model.eval()   

running_loss = 0.0
running_corrects = 0

for inputs, labels in dataloaders[phase]:
    inputs = inputs.to(device)
    labels = labels.to(device)

with torch.set_grad_enabled(phase == ‘test’):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)

      running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == labels.data)

        loss = running_loss / dataset_sizes[phase]
        acc = running_corrects.double() / dataset_sizes[phase]

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, loss, acc))

time_elapsed = time.time() - since
print(‘Testing complete in {:.0f}m {:.0f}s’.format(
time_elapsed // 60, time_elapsed % 60))
print(‘Best val Acc: {:4f}’.format(acc))

You could also convert all your images to RGB. Sounds like you don’t want to do that, but it’s your best bet if you want to use pretrained weights.

Based on this code it looks like you’ve used the plain ResNet with 3 input channels and also input with 3 channels, as your Normalization uses 3 values.

Probably I’m just a bit slow today, but would you like to load this trained model and add a 4th channel to it? Also, why does your data suddenly have another channel?