I am proficient enough to understand how to read Pytorch code and reimplement them to fit my own needs but being self-taught there are still a lot of things I do not understand. Truth to be told I didn’t do a lot of OOP at all before learning Pytorch, I mainly just made many functions and chain them together to make my network work. Since I started to look at other people’s code to learn Pytorch I have noticed that there are 2 major ways of coding your networks. One way is to stuff all the layers in nn.sequential() and just assign that to a model OR define a class then assign that to a model. My question is what is the major difference? I have tried both ways and IMO nn.sequential is easier, I have also seen nn.sequential defined within the model class as well.
You can use whatever fits your use case.
While some people like to use the nn.Sequential
approach a lot, I usually just use it for small sub-modules and like to define my model in a functional way, i.e. derive from nn.Module
and write the forward
method myself.
This approach gives you more flexibility in my opinion, since you are able to easily create skip connections etc.
On the other hand if you just want to stack layers together, nn.Sequential
is totally fine.
Is there a performance improvement when one used over another?
There shouldn’t be any differences regarding the performance, but let us know, if you’ve encountered something.
I have to create a model which has 3 parallel CNNs network. An image is fed into all the 3 networks and finally the outputs of the three networks are concatenated.
Can I model this if I define all the CNN networks in different classes?
nn.sequential can work for this if I define 3 different layers in the same class and concatenate them in forward method.
But I want to know if I can model such networks in three different classes and finally concatenate them when I train them? Will there be a problem in backpropogation?
Thanks. !
Yes, this should be possible.
You could create a “parent model” and pass or initialize the submodels in its __init__
method.
In the forward
just call each submodel with the corresponding data and concatenate their outputs afterwards.
This won’t create any issues regarding backpropagation.
Thankyou. This really helped!
Recently, I have encountered a performance issue when using nn.Sequential and not using it. When I used nn.Sequential for defining the bottleneck in efficientnetv2, the code works fine. However, if I tried functionally writing the bottleneck module, the training performance would improve, but validation accuracy would get stuck at 30%.
The reason I want to write my module functionally is that I am replacing the conv2d weights with custom weights and do not want my custom weights to be overwritten during weight initializing.
Below is the snippet code:
-
Using nn.Sequential
class MBConv(nn.Module):
def init(self, inp, oup, stride, expand_ratio, use_se,sparsity,block_type,act_func,padding):
super(MBConv, self).init()
assert stride in [1, 2]hidden_dim = round(inp * expand_ratio) self.identity = stride == 1 and inp == oup if use_se: self.conv = nn.Sequential( # pw nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False), nn.BatchNorm2d(hidden_dim), SiLU(), # dw BlockLBP(hidden_dim, hidden_dim, kernel_size=5, stride=stride,sparsity=sparsity,block_type=block_type,act_func=act_func,padding=2,groups=hidden_dim), nn.BatchNorm2d(hidden_dim), SiLU(), SELayer(inp, hidden_dim), # pw-linear nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), nn.BatchNorm2d(oup), ) else: self.conv = nn.Sequential( # fused BlockLBP(inp, hidden_dim, kernel_size=5, stride=stride,sparsity=sparsity,block_type=block_type,act_func=act_func,padding=2,groups=1), nn.BatchNorm2d(hidden_dim), SiLU(), # pw-linear nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), nn.BatchNorm2d(oup), ) def forward(self, x): if self.identity: return x + self.conv(x) else: return self.conv(x)
-
write the bottleneck module functionally
class MBConv(nn.Module):
def init(self, inp, oup, stride, expand_ratio, use_se,sparsity,block_type,act_func,padding):
super(MBConv, self).init()
assert stride in [1, 2]hidden_dim = round(inp * expand_ratio) self.identity = stride == 1 and inp == oup self.use_se = use_se if use_se: self.conv1x1= nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False) self.bn1 = nn.BatchNorm2d(hidden_dim) self.act_func = SiLU() self.shape_conv = BlockLBP(hidden_dim, hidden_dim, kernel_size=5, stride=stride,sparsity=sparsity,block_type=block_type,act_func=act_func,padding=2,groups=hidden_dim) self.SE_layer = SELayer(inp, hidden_dim) self.conv_final = nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False) self.bn2 = nn.BatchNorm2d(oup) else: self.shape_conv = BlockLBP(inp, hidden_dim, kernel_size=5, stride=stride,sparsity=sparsity,block_type=block_type,act_func=act_func,padding=2,groups=1) self.bn1 = nn.BatchNorm2d(hidden_dim) self.act_func = SiLU() self.conv_final = nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False) self.bn2 = nn.BatchNorm2d(oup) def forward(self, x): iden = x if self.use_se: x = self.conv1x1(x) x = self.bn1(x) x = self.act_func(x) x = self.shape_conv(x) x = self.bn1(x) x = self.act_func(x) x = self.SE_layer(x) x = self.conv_final(x) x = self.bn2(x) else: x = self.shape_conv(x) x = self.bn1(x) x = self.act_func(x) x = self.conv_final(x) x = self.bn2(x) if self.identity: return x + iden else: return x
It would be great if I could get some insights about this.
Thanks!
In your functional approach you are reusing bn1
twice while you are initializing a new batchnorm layer on the nn.Sequential
approach. Fix this and also check if the number of parameter and buffers matches between both approaches.
Thank you ptrblck. You mentioned about checking the buffer and number of parameters, how do I check them?
You could use:
model_param_size = sum([p.nelement() for p in model.parameters()])
And the same for the buffers.
Thank you, @ptrblck, instead of using the same batch norm layer, I added a new one and started getting the correct values.