Help in ResNet BasicBlock architecture

Eyal · November 14, 2020, 11:05am

Hello there.
I’m currently doing a deep learning course and trying to implement a resnet, while doing so, I’ve been bumping into this implementation in github:

class BasicBlock(nn.Module):
    def __init__(self, in_planes, planes):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, 3, padding = 1)
        self.bn1 = nn.BatchNorm2d(planes)
        
        self.conv2 = nn.Conv2d(planes, planes, 3, padding = 1)
        self.bn2 = nn.BatchNorm2d(planes)
        
        self.shortcut = nn.Sequential()
        if (in_planes != planes):
            self.shortcut = nn.Sequential( nn.Conv2d(in_planes, planes, 3, padding = 1),
                                           nn.BatchNorm2d(planes))
            
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x) #for the input
        out = F.relu(out)
        return out

I don’t understand the shortcut, the if statements make it executable only if the channels between two adjacent layers are the same, however, if they’re not it just executes the regular basic block again.

However, later in the code, they execute the resnet like that:

    def __init__(self, in_channel, hidden_channels, num_classes):
        super(SmallResNet, self).__init__()
        self.conv = nn.Conv2d(in_channel, hidden_channels[0], 3, padding = 1) #first conv
        self.bn = nn.BatchNorm2d(hidden_channels[0]) #then batchNorm
        #now use 3 residual blocks
        self.res1 = BasicBlock(hidden_channels[0],hidden_channels[1])
        self.res2 = BasicBlock(hidden_channels[1],hidden_channels[2])
        self.res3 = BasicBlock(hidden_channels[2],hidden_channels[3])
        #now do the maxpooling
        self.maxpool = nn.MaxPool2d(2, 2) 
        self.fc = nn.Linear(hidden_channels[3] * 16 * 16 , num_classes) #from maxpooling

And calling it with hidden channels

hidden = [16,32,64,128]
So it seems to me like missing the whole point of ResNet, because there are no ‘‘skips’’ here.
What am I seeing wrong?

Thanks.

ptrblck · November 16, 2020, 7:26am

If the number of channels differ, the additional conv and batchnorm layers in shortcut will make sure that you can add the residual connection back to out.
On the other hand, if the channels match already, x will be directly added to out, since an empty nn.Sequential module will act as an identity:

seq = nn.Sequential()
x = torch.randn(1, 3, 24, 24)
out = seq(x)
print((out == x).all())
> tensor(True)

The torchvision implementation uses a similar approach, but will skip the downsample layer, if it’s not needed.

Eyal · November 16, 2020, 9:25am

Thank you very much! So just to make sure I understood when he called the net with
hidden = [16,32,64,128]
No residuals were taken place but adding the conv - batchnorm block twice. right?

ptrblck · November 17, 2020, 10:56am

The hidden list is used to initialize the layers with the right channels. No forward pass is used in this call.
Let me know, if I misunderstood the question.

Eyal · November 17, 2020, 8:29pm

What I mean is, when the hidden layers were initialized as
hidden = [16, 32, 64, 128]
Not time was the Identity added because the channels differ in every two adjacent layers.
So I’m wondering, in this initializment, What’s the role of the resnet? it seems to me like it’s missing the whole point.

ptrblck · November 18, 2020, 8:58am

Thanks for clarifying the question.
I’m not familiar with the model and would suggest to ask the author.
My best guess is that the author indented to write the model in a flexible way and this particular implementation would use the shortcut blocks.