When should I use nn.ModuleList and when should I use nn.Sequential?

I am new to Pytorch and one thing that I don’t quite understand is the usage of nn.ModuleList and nn.Sequential. Can I know when I should use one over the other? Thanks.


nn.ModuleList is just like a Python list. It was designed to store any desired number of nn.Module’s. It may be useful, for instance, if you want to design a neural network whose number of layers is passed as input:

class LinearNet(nn.Module):
  def __init__(self, input_size, num_layers, layers_size, output_size):
     super(LinearNet, self).__init__()

     self.linears = nn.ModuleList([nn.Linear(input_size, layers_size)])
     self.linears.extend([nn.Linear(layers_size, layers_size) for i in range(1, self.num_layers-1)])
     self.linears.append(nn.Linear(layers_size, output_size)

nn.Sequential allows you to build a neural net by specifying sequentially the building blocks (nn.Module’s) of that net. Here’s an example:

class Flatten(nn.Module):
  def forward(self, x):
    N, C, H, W = x.size() # read in N, C, H, W
    return x.view(N, -1)

simple_cnn = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=7, stride=2),
            nn.Linear(5408, 10),

From what I see, it is interchangeable then? Unless there is some order to be followed, then we should use Sequential. Am I right?

1 Like

Not really. Maybe there are some situations where you could use both, but the main idea is the following:

In nn.Sequential, the nn.Module's stored inside are connected in a cascaded way. For instance, in the example that I gave, I define a neural network that receives as input an image with 3 channels and outputs 10 neurons. That network is composed by the following blocks, in the following order: Conv2D -> ReLU -> Linear layer. Moreover, an object of type nn.Sequential has a forward() method, so if I have an input image x I can directly call y = simple_cnn(x) to obtain the scores for x. When you define an nn.Sequential you must be careful to make sure that the output size of a block matches the input size of the following block. Basically, it behaves just like a nn.Module

On the other hand, nn.ModuleList does not have a forward() method, because it does not define any neural network, that is, there is no connection between each of the nn.Module's that it stores. You may use it to store nn.Module's, just like you use Python lists to store other types of objects (integers, strings, etc). The advantage of using nn.ModuleList's instead of using conventional Python lists to store nn.Module's is that Pytorch is “aware” of the existence of the nn.Module's inside an nn.ModuleList, which is not the case for Python lists. If you want to understand exactly what I mean, just try to redefine my class LinearNet using a Python list instead of a nn.ModuleList and train it. When defining the optimizer() for that net, you’ll get an error saying that your model has no parameters, because PyTorch does not see the parameters of the layers stored in a Python list. If you use a nn.ModuleList instead, you’ll get no error.


Hi dpernes, thanks for the great explanation. I noticed we can pass a ModuleList as an argument to Sequential, does that mean the layers in the ModuleList are then connected in the cascading way as they would be if we passed them as arguments to Sequential directly and unlisted?

If not, then how can we pass multiple layers as arguments to Sequential (like as a list or set) and have them cascade as they would if we passed them individually?

Hi @luca590!

Honestly, I am not sure if it is legal to pass a ModuleList directly to a Sequential, but you can give it a try in a toy example!

Anyway, if it does not work, there is a simple workaround. You may define a nn.Module whose init() method receives a ModuleList and connects each Module in the ModuleList sequentially. Then, simply add the defined Module to your Sequential. Did you get the idea or do you want me to write some sample code for it?

EDIT: After taking a brief look at the implementation of nn.Sequential(), I would say that passing it a ModuleList() probably works :slight_smile: Please give it a try and then update me :stuck_out_tongue:

1 Like

Hey, thanks for the super fast response. I get the idea, simple enough. I guess I could also define the forward method in the class that receives a ModuleList and adds sequentially?

So yeah, it doesn’t complain if you pass Sequential a ModuleList, but then it gives a
when you do a forward pass. So my code looks like:
a = nn.Conv2d(1, 1, kernel_size=3, padding=1, bias=False)
b = nn.ModuleList([a])
c = nn.Sequential(b)

The other thing is I noticed you can pass a list of class objects to nn.Sequential like this guy does here implementing DenseNet in the _make_dense function at the bottom. Why exactly does this work?

1 Like

Yeah, if it gives a NotImplimentedError, then maybe we’ll have that functionality in the future…

I guess I could also define the forward method in the class that receives a ModuleList and adds sequentially?

Actually, you should do that. I omitted that part in my explanation for brevity.

The other thing is I noticed you can pass a list of class objects to nn.Sequential like this guy does here implementing DenseNet in the _make_dense function. Why exactly does this work?

That’s a clever implementation, I would say! Notice that he is not passing the list layers directly to the nn.Sequential, but rather the content of the list layers. That’s why he does nn.Sequential(*layers) instead of nn.Sequential(layers). If you’re not familiar with the usage of a * before a tuple object, please refer to https://stackoverflow.com/questions/11315010/what-do-and-before-a-variable-name-mean-in-a-function-signature.

Hope this helps :slight_smile:


Oh I see, yeah that helps a lot. Thanks!

The reason you are seeing NotImplimentedError is that ModuleList is really meant to used like a list. There is not forward operation defined on it, and I doubt there will be in future.


Sorry for replying to such an old thread, but I found an interesting use-case where nn.ModuleList kinda saved me. Basically, if you have a module with a variable number of layers

import numpy as np
import torch as tr
import torch.nn as nn

def getNumParams(params):
	numParams, numTrainable = 0, 0
	for param in params:
		npParamCount = np.prod(param.data.shape)
		numParams += npParamCount
		if param.requires_grad:
			numTrainable += npParamCount
	return numParams, numTrainable

# Using list
class Module1(nn.Module):
	def __init__(self, dIn, dOut, numLayers):
		super(Module1, self).__init__()
		self.layers = []
		for i in range(numLayers - 1):
			self.layers.append(nn.Conv2d(in_channels=dIn, out_channels=dIn, kernel_size=1))
		self.layers.append(nn.Conv2d(in_channels=dIn, out_channels=dOut, kernel_size=1))

	def forward(self, x):
		y = x
		for i in range(len(self.layers)):
			y = self.layers[i](y)
		return y

# Using nn.ModuleList
class Module2(nn.Module):
	def __init__(self, dIn, dOut, numLayers):
		super(Module2, self).__init__()
		self.layers = nn.ModuleList()
		for i in range(numLayers - 1):
			self.layers.append(nn.Conv2d(in_channels=dIn, out_channels=dIn, kernel_size=1))
		self.layers.append(nn.Conv2d(in_channels=dIn, out_channels=dOut, kernel_size=1))

	def forward(self, x):
		y = x
		for i in range(len(self.layers)):
			y = self.layers[i](y)
		return y

def main():
	x = tr.randn(1, 7, 30, 30)

	module1 = Module1(dIn=7, dOut=13, numLayers=10)
	y1 = module1(x)
	print(y1.shape) # (1, 13, 30, 30)
	print(getNumParams(module1.parameters())) # Prints (0, 0)

	module2 = Module2(dIn=7, dOut=13, numLayers=10)
	y2 = module2(x)
	print(getNumParams(module2.parameters())) # Print (608, 608)
	print(y2.shape) # (1, 13, 30, 30)

if __name__ == "__main__":

Just my 2c on how this feature saved me, as my code checks for params count when loading network weights :slight_smile:


nn.Module comes in handy while writing many DL model. For example when you are trying to code Maxout Network as defined in the paper [Maxout Networks] (https://arxiv.org/pdf/1302.4389.pdf).

class maxout_mlp(nn.Module):
    def __init__(self, num_units=2):
        self.fc1_list= nn.ModuleList()
        self.fc2_list= nn.ModuleList()
        for _ in range(num_units):
    def forward(self,x):
        x= x.view(-1,784)
        x= self.maxout(x,self.fc1_list)
        x= F.dropout(x, training= self.training)
        x= self.maxout(x,self.fc2_list)
        return F.log_softmax(x)
    def maxout(self,x, layer_list):
        max_output= layer_list[0](x) # pass x to first unit in layer1
        for _, layer in enumerate(layer_list, start=1):
            max_output= torch.max(layer(x),max_output)
        return max_output

Thanks for posting this!
My understanding is that the 2 versions only differ when you look at the parameter count, but the results for y1 and y2 would be the same (assuming same seed for random initialization), correct?

So the only difference between Sequential and ModuleList is that, Sequential does not has a append method which does not allowed you to add layers in a for loop.

Stupid question, why use ModuleList instead of a normal python list? is it so that parameters are included in the .parameters() iterator?


Exactly! If you use a plain python list, the parameters won’t be registered properly and you can’t pass them to your optimizer using model.parameters().


The question is already answered several times but I want to share my experience which may help you to think on a practical case

Firstly, I want to mention again nn.Sequential stores some layers which has already implemented forward method where layers are passed in a cascaded way. The point is, you dont always want layers to be cascaded. In my case what I need was concatenating output of CNN Layers having different kernel sizes.

Here is the paper, I tried to implement “Convolutional Neural Networks for Sentence Classification”. For the starting point I found an implementation on github

class CNNSentence(nn.Module):
        def __init__(self, args, data, vectors):
                super(CNNSentence, self).__init__()
                for filter_size in args.FILTER_SIZES:
                       conv = nn.Conv1d(self.in_channels,
                                        args.word_dim * filter_size,
                       setattr(self, 'conv_' + str(filter_size), conv)

        def forward(self, batch):
                conv_result = [
                        F.max_pool1d(F.relu(getattr(self, 'conv_' + str(filter_size))(conv_in)),
                                     seq_len - filter_size + 1).view(-1, self.args.num_feature_maps)
                        for filter_size in self.args.FILTER_SIZES]

                out = torch.cat(conv_result, 1)


However, I skipped setting the convolutional layers as attribute while rewriting the model. Later, while transferring the network to gpu, I realized that convolutional layer is not in my network since I got an error. My failed code is below:

class CNN_Sentence(nn.Module):
    def __init__(self, ..., ngram_filter_sizes=[3, 4, 5], ...):
        super(CNN_Sentence, self).__init__()
        self.convs = []
        for ngram_filter in ngram_filter_sizes:
            conv = nn.Conv1d(embedding_size,

    def forward(self, batch):
        x = []
        for conv in self.convs:
            conv_out = conv(batch)
            max_pool_kernel = conv_out.shape[2]
            conv_out = F.max_pool1d(F.relu(conv_out),
            x.append(conv_out.view(bath_size, -1))

I think it is a good example why we need nn.ModuleList and why it is different than nn.Sequential

You can find the entry point discussion of nn.ModuleList also, which helped me to discover the class nn.ModuleList


I was trying to implement some the existing Pytorch model from python to C++ using libtorch API.
Have couple of Blocks in model declared as nn.ModuleList().
How can implement the same in C++.
Below is dummy snippet for the same.

self.initial_layer = DummyConv(in_channels, growth_ratenum_layers,dilation=1,
kernel_size=kernel_size, pad=pad, x)
self.layers = nn.ModuleList()
for i in range(1,num_layers):
self.layers.add_module('layer%s' % i, DummyConv(growth_rate, growth_rate(num_layers-i), dilation=i,
kernel_size=kernel_size, pad=i,)

def forward(self, x):
    out = self.initial_layer(x)
    for i, layer in enumerate(self.layers):
        out[:,(i+1)*self.growth_rate:] += layer(out[:,i*self.growth_rate:(i+1)*self.growth_rate].contiguous())

    return out[:,-self.growth_rate:]`

The question was when, and the answer may be in cases when you need dynamic module structure and you don’t know in advance how it will look.
The original example provided is fair:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

This doesn’t have anything with dynamic graph creation, which PyTorch also do.

If I am not wrong, there must be at least one forward method in PyTorch, so module list will be part of class which will evaluate in that class forward.

Using a class derived from nn.Module some call also a functional approach.

nn.ModuleList can be child of nn.Sequentional and in that case inside sequentional we need to have a class that aggregates it.