Residual blocks with 1 layer per block?

jayz · March 30, 2022, 7:15am

Hey,

I am currently reading the resnet paper, and I noticed their residual blocks always contain two convolutions. I see the first convolution is used to map the input channels to the desired channel number of the residual block (if there is a change in channel dimensions between subsequent residual blocks), while the second convolution keeps the channel dimension fixed. When there is no change in dimension between two residual blocks, both keep the input channel dimension fixed. I was wondering, why is it actually necessary to have two convolutions in the same block? One could just have a single convolution:

xconv = nn.Conv2d(in_channels, out_channels, kernel_size=(k,k))(x_in)

and for the convolution corresponding to the skip connection for computing the residual, one would have:

xskip = nn.Conv2d(in_channels, out_channels, kernel_size=(1,1))(x_in)

Then, the output of the residual block:

x = xconv + xskip

Is the reason for having blocks with two (or more) layers just a design choice, or is there a specific reason to use at least two conv layers (and not one) per residual block?

Skip connections are just elementwise additions and computationally cheap, therefore, I can’t imagine that computational effort would be the explanation.

Thanks!
Best, JZ

ptrblck · March 31, 2022, 1:02am

I’m not sure if you are referring to BasicBlock and Bottleneck defined here, but note that:

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

cannot be replaced with a single convolution.

jayz · March 31, 2022, 5:11am

Yes, referring to that.
The layers are defined like:

self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = norm_layer(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = norm_layer(planes)

If its just about getting the right shape of out, one could replace it with a single convolution:

conv = conv3x3(inplanes,planes,stride)

no?
But you mean the fact that there is a nonlinearity ReLU in between changes that?

Best, JZ

ptrblck · March 31, 2022, 5:38am

Right, if you are only concerned about getting the shape right a single layer would do it. However, the actual processing of multiple layers with non-linearities between them would not be the same, so you might lose the actual training properties of these blocks.

jayz · March 31, 2022, 5:43am

ok, thanks for clarifying!

jayz · March 31, 2022, 1:26pm

Hey once more,

so I continued looking at the resnet code. The downsampling function is:

downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )

This means that when stride s > 1, just every s’th entry of the input x is “transferred” via the skip connection. Isn’t this a loss of valuable information? Is this what actually happens? Because then it could be better to use a pooling function like avgpooling with stride=s before the conv1x1 to take care of the downsampling and then perform the conv1x1 with stride=1?

Thanks!
Best, JZ

ptrblck · March 31, 2022, 9:52pm

Sure, your idea sounds valid and sounds as if some experiments might be worth to try out.
Let us know if you see any improvement

jayz · April 1, 2022, 9:45am

Yes, I will do that!