`ValueError: optimizer got an empty parameter list` For Resnet18 project with extra upsampling layers

I’m a new Pytorch user and my goal is to make a hypercolumn semantic segmentation network that I already made in Keras.

So I grafted some upsamplng and convolution layers onto the pretrained Resnet18, and the graph seems to work and able to forward pass images successfully.

But when I attempt to actually train the net I encounter problems…

optimizer = optim.Adam(net.parameters())

throws an error: ValueError: optimizing a parameter that doesn't require gradients

and optimizer = optim.Adam(filter(lambda x: x.requires_grad, list(net.parameters())))

throws an error

ValueError: optimizer got an empty parameter list

Here is my net architecture definition:

from torchvision.models.resnet import resnet18
from torch import nn

# Implementation of Hypercolumns for Object Segmentation and Fine-grained Localization
# by Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik
# https://arxiv.org/pdf/1411.5752.pdf

class HypercolumnResnet(nn.Module):
    def __init__(self):
        super(HypercolumnResnet, self).__init__()
        # spawn a Resnet18
        base_model = resnet18(pretrained=True)

        # freeze the weights
        for param in base_model.parameters():
            param.requires_grad = False

        # debugging
        # dog = list(base_model.children())
        # for i, item in enumerate(dog):
        #     print(i, item)

        # identify all the layers whose output we want to join into hypercolumns
        self.features_0 = nn.Sequential(*list(base_model.children())[:3])
        self.features_1 = nn.Sequential(*list(base_model.children())[:5])
        self.features_2 = nn.Sequential(*list(base_model.children())[:6])
        self.features_3 = nn.Sequential(*list(base_model.children())[:7])
        self.features_4 = nn.Sequential(*list(base_model.children())[:8])

    def forward(self, x):
        # get the half-resolution feature tensor
        half_size = self.features_0(x)

        # make all feature tensors have the same channel dimensionality (64) so that we can sum them
        quarter_size = nn.Conv2d(64, 64, 1)(self.features_1(x))
        eighth_size = nn.Conv2d(128, 64, 1)(self.features_2(x))
        sixteenth_size = nn.Conv2d(256, 64, 1)(self.features_3(x))
        thirtytooth_size = nn.Conv2d(512, 64, 1)(self.features_4(x))

        # double the size of the quarter-resolution feature tensor
        double_quarter_size = nn.Upsample(scale_factor=2)(quarter_size)

        # quadruple the size of the eighth-resolution
        quadrupled_eighth_size = nn.Upsample(scale_factor=4)(eighth_size)

        # octuple the size of the sixteenth-resolution
        octupled_sixteenth_size = nn.Upsample(scale_factor=8)(sixteenth_size)

        # sixteenifiy the size of the 32th-resolution
        sixteenified_32th_size = nn.Upsample(scale_factor=16)(thirtytooth_size)

        # merge
        combined = half_size + double_quarter_size + quadrupled_eighth_size + octupled_sixteenth_size + sixteenified_32th_size

        # one more conv layer to smooth things out
        conv = nn.Conv2d(64, 128, 3, padding=1)(combined)
        conv = nn.BatchNorm2d(128)(conv)
        conv = nn.ReLU()(conv)

        # output
        segmentation = nn.Conv2d(128, 2, 3, padding=1)(conv)
        segmentation = nn.Softmax2d()(segmentation)

        return segmentation

Somebody showed me the mistake I was making…

All those Conv2D and BatchNorm operations I was using in the forward() method needed to be moved into the __init__() constructor. In general, any learnable layers should go in the constructor. That way, Pytorch is (somehow) able to keep track of all the model parameters.

N.B. It’s OK to put non-trainable operations like activations and reshapes etc. in the forward() method.