My make_resblock function does not work

(李志) #1

I adopt residual block and stack it many times to construct my network.Thus,I define a function to return a list to build my residual block,

def make_resblock(in_channels,out_channels,kernel_size,stride,padding):
    return nn.Sequential(*res)

# in init function

It seems runs well during the model construction process.But it makes the network untrainable,loss in backpropagation is blocked by this residual block,it never descents.When I remove it in forward pass ,the network is trainable. And when I replace my original function with this,

           nn.Conv2d(256, 256, 3, 1,1),
          nn.Conv2d(256, 256, 3, 1, 1),

It works,the residual block runs well.I want to know where to modify and the reason,how to modify my function, since I will use the residual block many times, I do not want to redeclare it when i use.

(Alban D) #2

The difference you have compared to when you create it by hand is that in the first case, both convs and batchnorm have the same parameters ! That could be a problem for training.

(李志) #4

so how to modify my code? I 've bothered for a long time and I do not have any idea about it.

(Alban D) #5

Depends what you want to do. If you don’t want the paremeter sharing, then you second sample

def make_resblock(in_channels,out_channels,kernel_size,stride,padding):
    return nn.Sequential(
           nn.Conv2d(in_channels, out_channels, kernel_size, stride,padding),
           nn.Conv2d(out_channels, out_channels, kernel_size, stride,padding),

Is what you want.

If you do want the parameter sharing, then your code is good but it does not seem to work in practice.

(李志) #6

I do not want to param sharing.And Could u plz tell me the reason why I was wrong?


why my make_block returns always a same block.Thx

(Alban D) #7

When you do res += res at the end of your make_block function, you append to the list res, the content of the list res. That means that the first element is added as the 4th element and so on. This means that 1st and 4th, 2nd and 5th and 3rd and 6th are the same elements. You only have 1 conv, 1 bn and 1 relu module. And they each will be called twice in your sequential. This is effectivement doing parameter sharing within your block where the two convs and two BN share the same params (and statistics for the BN).