"make_layers" in torchvision ResNet implementation

yuqli · February 3, 2019, 1:22am

I’m trying to write a decoder that can upsample an image from a latent vector but has similar network structure as ResNet.

From PyTroch’s implementation of ResNet I found this following function and find it confusing :

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

Specifically, my questions are

What is block.expansion? I believe this is from the Basic Block class which is also in the source code, but I can’t seem to grasp its meaning. Is it the expansion of number of input channels?
What does the “if” condition translate to?

Thanks!

vmirly1 · February 3, 2019, 3:56am

block.expansion is defined as a class attribute (for example here), which is just an integer number indicating the expansion in the number of feature-maps through a convolution layer.

There is a convolution layer (see this line) where the number of input feature maps are planes and the number of output feature maps are planes * expansion.

The if-statement indicates if this layer needs downsample or not. And this is determined by two conditions:

whether stride is 1 or not
whether the current number of input feature-maps (self.inplanes) is equal to the planes * block.expansion or not.

The first condition is obvious that he it is related to downsampling. The second case corresponds to when the number of input features maps and output feature maps are not equal.