Changing in_features in fc-layer for resnet18 in torchvision

Hello everyone,

I am new to torchvision and want to change the number of in_features for the fully-connected layer at the end of a resnet18:
resnet18 = torchvision.models.resnet18(pretrained=False)
resnet18.fc.in_features = 256

I want to do so as I want to use the CNN as a feature extractor, i.e. I want to generate a 256-dimensional embedding for each image.

When changing the code as outlined above, I receive no error. I want to train from scratch, so pretrained models are irrelevant for me.

However, I wonder whether my thoughts are plausible and the implementation is correct? The change in the code seems too small as the number of in_features should depend on the number of channels which depend on the number and type of filters, which I am not adjusting?

Best,
always

That code by itself is definitely not enough to accomplish what you’re looking for. A few things to consider -

  1. Your model will still need a loss function in some context. I am not sure what your plan is for this, but it isn’t too important at the moment.
  2. When you say the embedding, do you mean the output of the model or the fully connected layer after the embedding? Depending on your application this might be an important distinction. In the ResNet definition, there is no hidden layer between the average pooling and output.
  3. The standard resnet model in pytorch doesn’t support what you’re looking for by default, but it is not a particularly hard change to make. See below.
import torch.nn as nn
import math
from torchvision.models.resnet import BasicBlock, conv3x3, Bottleneck

class ResNetEmbedding(nn.module):
    def __init__(self, block, layers, embedding_dim=256):
        self.inplanes = 64
        super(ResNetEmbedding, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(512 * block.expansion, embedding_dim)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)

        return x

def resnet18( **kwargs):
    """Constructs a ResNet-18 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNetEmbedding(BasicBlock, [2, 2, 2, 2], **kwargs)
    return model

Note, a lot of this can just be imported as a subclass of ResNet, but this should be close to what you need. Please do heed my note about eh output dimensionality of the ResNet model.

Hey Andrew,

thanks for the quick response!

Regarding your aspects to consider:

  1. The loss should be calculated as usual, e.g. cross entropy between true and predicted label.

  2. I mean the output after global average pooling, i.e. the input for the fully connected layer. So it should be just a normally trained ResNet that is cut open before the FCE and then presents a 256-dimensional embedding.

  3. Is your code equivalent to the following?

resnet18 = torchvision.models.resnet18(pretrained=False)
resnet18.fc.out_features = 256

If I understand your code correctly, this would create a 256-dimensional embedding, but not be trainable to be class-discriminative (unless adding another FC-layer, which I do not want to do). In contrast, I desire the embedding coming directly from the convolutional operations (and downsampled by GAP).

Best,
always

I have been trying some additional stuff:

Rephrased problem: The dimensionality of the desired embedding (after average pooling and before the fully connected layer) directly corresponds to the number of channels. Those channels are influenced by the convolutional layers.

Preliminary solution: I adjusted input- and output channels in the ResNet-class:

class ResNet(nn.Module):

    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 32
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(32)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 32, layers[0])
        self.layer2 = self._make_layer(block, 64, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 128, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 256, layers[3], stride=2)
        self.avgpool = nn.AvgPool2d(7, stride=1)
        self.fc = nn.Linear(256 * block.expansion, num_classes)

As I did not receive an error in my previous solution (=without adjusting dimensionality of the convent-layer, see entry post), I cannot yet verify my solution. Maybe one of you can?

Best,
Always

You should just use the output of the avgpool as your embedding. The output of fc is typically a classification or regression result.

Hi Andrew,
yes, that’s exactly what I am intending, i.e. training the entire net to be class-discriminative and then extracting the output of avgpool for a new image.

Ok so what is your question at this point?

Hi, andrew! I would like to use ResNet50 to extract compact features (say, 256-dimensional). Can I add a new FC layer, which has 256 neurons, between the avepool layer and the output layer (“output layer” means the layer that outputs classification results)?

Hi!
Maybe you can add a new FC layer, which has 256 neurons, between the avepool layer and the output layer (“output layer” means the layer that outputs classification results).

Is it OK?