Residual blocks with 1 layer per block?


I am currently reading the resnet paper, and I noticed their residual blocks always contain two convolutions. I see the first convolution is used to map the input channels to the desired channel number of the residual block (if there is a change in channel dimensions between subsequent residual blocks), while the second convolution keeps the channel dimension fixed. When there is no change in dimension between two residual blocks, both keep the input channel dimension fixed. I was wondering, why is it actually necessary to have two convolutions in the same block? One could just have a single convolution:

xconv = nn.Conv2d(in_channels, out_channels, kernel_size=(k,k))(x_in)

and for the convolution corresponding to the skip connection for computing the residual, one would have:

xskip = nn.Conv2d(in_channels, out_channels, kernel_size=(1,1))(x_in)

Then, the output of the residual block:

x = xconv + xskip

Is the reason for having blocks with two (or more) layers just a design choice, or is there a specific reason to use at least two conv layers (and not one) per residual block?

Skip connections are just elementwise additions and computationally cheap, therefore, I can’t imagine that computational effort would be the explanation.

Best, JZ

I’m not sure if you are referring to BasicBlock and Bottleneck defined here, but note that:

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

cannot be replaced with a single convolution.

Yes, referring to that.
The layers are defined like:

self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = norm_layer(planes)

If its just about getting the right shape of out, one could replace it with a single convolution:

conv = conv3x3(inplanes,planes,stride)

But you mean the fact that there is a nonlinearity ReLU in between changes that?

Best, JZ

Right, if you are only concerned about getting the shape right a single layer would do it. However, the actual processing of multiple layers with non-linearities between them would not be the same, so you might lose the actual training properties of these blocks.

ok, thanks for clarifying!

Hey once more,

so I continued looking at the resnet code. The downsampling function is:

downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),

This means that when stride s > 1, just every s’th entry of the input x is “transferred” via the skip connection. Isn’t this a loss of valuable information? Is this what actually happens? Because then it could be better to use a pooling function like avgpooling with stride=s before the conv1x1 to take care of the downsampling and then perform the conv1x1 with stride=1?

Best, JZ

Sure, your idea sounds valid and sounds as if some experiments might be worth to try out.
Let us know if you see any improvement :wink:

Yes, I will do that!