Differences between writing models with nn.sequential() VS creating a Class?

I am proficient enough to understand how to read Pytorch code and reimplement them to fit my own needs but being self-taught there are still a lot of things I do not understand. Truth to be told I didn’t do a lot of OOP at all before learning Pytorch, I mainly just made many functions and chain them together to make my network work. Since I started to look at other people’s code to learn Pytorch I have noticed that there are 2 major ways of coding your networks. One way is to stuff all the layers in nn.sequential() and just assign that to a model OR define a class then assign that to a model. My question is what is the major difference? I have tried both ways and IMO nn.sequential is easier, I have also seen nn.sequential defined within the model class as well.

7 Likes

You can use whatever fits your use case.
While some people like to use the nn.Sequential approach a lot, I usually just use it for small sub-modules and like to define my model in a functional way, i.e. derive from nn.Module and write the forward method myself.

This approach gives you more flexibility in my opinion, since you are able to easily create skip connections etc.
On the other hand if you just want to stack layers together, nn.Sequential is totally fine.

18 Likes

Is there a performance improvement when one used over another?

There shouldn’t be any differences regarding the performance, but let us know, if you’ve encountered something.

2 Likes

I have to create a model which has 3 parallel CNNs network. An image is fed into all the 3 networks and finally the outputs of the three networks are concatenated.
Can I model this if I define all the CNN networks in different classes?

nn.sequential can work for this if I define 3 different layers in the same class and concatenate them in forward method.
But I want to know if I can model such networks in three different classes and finally concatenate them when I train them? Will there be a problem in backpropogation?

Thanks. !

Yes, this should be possible.
You could create a “parent model” and pass or initialize the submodels in its __init__ method.
In the forward just call each submodel with the corresponding data and concatenate their outputs afterwards.
This won’t create any issues regarding backpropagation.

2 Likes

Thankyou. This really helped!

1 Like

Recently, I have encountered a performance issue when using nn.Sequential and not using it. When I used nn.Sequential for defining the bottleneck in efficientnetv2, the code works fine. However, if I tried functionally writing the bottleneck module, the training performance would improve, but validation accuracy would get stuck at 30%.

The reason I want to write my module functionally is that I am replacing the conv2d weights with custom weights and do not want my custom weights to be overwritten during weight initializing.

Below is the snippet code:

  1. Using nn.Sequential

    class MBConv(nn.Module):
    def init(self, inp, oup, stride, expand_ratio, use_se,sparsity,block_type,act_func,padding):
    super(MBConv, self).init()
    assert stride in [1, 2]

           hidden_dim = round(inp * expand_ratio)
           self.identity = stride == 1 and inp == oup
            if use_se:
               self.conv = nn.Sequential(
                   # pw
                   nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
                   nn.BatchNorm2d(hidden_dim),
                   SiLU(),
                  
                   # dw
                    BlockLBP(hidden_dim, hidden_dim, kernel_size=5, stride=stride,sparsity=sparsity,block_type=block_type,act_func=act_func,padding=2,groups=hidden_dim),
                   nn.BatchNorm2d(hidden_dim),
                   SiLU(),
                   SELayer(inp, hidden_dim),
                   # pw-linear
                   nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                   nn.BatchNorm2d(oup),
               )
           else:
               self.conv = nn.Sequential(
                   # fused
                   BlockLBP(inp, hidden_dim, kernel_size=5, stride=stride,sparsity=sparsity,block_type=block_type,act_func=act_func,padding=2,groups=1),
    
                   nn.BatchNorm2d(hidden_dim),
                   SiLU(),
                   # pw-linear
                   nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                   nn.BatchNorm2d(oup),
               )
    
    
       def forward(self, x):
           if self.identity:
               return x + self.conv(x)
           else:
               return self.conv(x)
    
  2. write the bottleneck module functionally

    class MBConv(nn.Module):
    def init(self, inp, oup, stride, expand_ratio, use_se,sparsity,block_type,act_func,padding):
    super(MBConv, self).init()
    assert stride in [1, 2]

         hidden_dim = round(inp * expand_ratio)
         self.identity = stride == 1 and inp == oup
         self.use_se = use_se   
         if use_se:
             self.conv1x1= nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False)
             self.bn1 =  nn.BatchNorm2d(hidden_dim)
             
             self.act_func = SiLU()
             self.shape_conv = BlockLBP(hidden_dim, hidden_dim, kernel_size=5, stride=stride,sparsity=sparsity,block_type=block_type,act_func=act_func,padding=2,groups=hidden_dim)
             
             self.SE_layer = SELayer(inp, hidden_dim)
             self.conv_final = nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False)
             self.bn2 =  nn.BatchNorm2d(oup)
             
         else:
            
             self.shape_conv = BlockLBP(inp, hidden_dim, kernel_size=5, stride=stride,sparsity=sparsity,block_type=block_type,act_func=act_func,padding=2,groups=1)
             self.bn1 =  nn.BatchNorm2d(hidden_dim)
             self.act_func = SiLU()
             self.conv_final = nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False)
             self.bn2 =  nn.BatchNorm2d(oup)
             
             
            
    
     def forward(self, x):
         iden = x
         
         if self.use_se:
             x = self.conv1x1(x)
             x = self.bn1(x)
             x = self.act_func(x)
             x = self.shape_conv(x)
             x = self.bn1(x)
             x = self.act_func(x)
             x = self.SE_layer(x)
             x = self.conv_final(x)
             x = self.bn2(x)
             
         else:
             x = self.shape_conv(x)
             x = self.bn1(x)
             x = self.act_func(x)
             x = self.conv_final(x)
             x = self.bn2(x)
         if self.identity:
             return x + iden
         else:
             return x
    

It would be great if I could get some insights about this.

Thanks!

In your functional approach you are reusing bn1 twice while you are initializing a new batchnorm layer on the nn.Sequential approach. Fix this and also check if the number of parameter and buffers matches between both approaches.

1 Like

Thank you ptrblck. You mentioned about checking the buffer and number of parameters, how do I check them?

You could use:

model_param_size = sum([p.nelement() for p in model.parameters()])

And the same for the buffers.

1 Like

Thank you, @ptrblck, instead of using the same batch norm layer, I added a new one and started getting the correct values.

1 Like