Access weights of a specific module in nn.Sequential()


this should be a quick one, but I wasn’t able to figure it out myself.
When I use a pre-defined module in PyTorch, I can typically access its weights fairly easily.
However, how do I access them if I wrapped the module in nn.Sequential() first?
Please see toy example below.

class My_Model_1(nn.Module):
    def __init__(self,D_in,D_out):
        super(My_Model_1, self).__init__()
        self.layer = nn.Linear(D_in,D_out)
    def forward(self,x):
        out = self.layer(x)
        return out

class My_Model_2(nn.Module):
    def __init__(self,D_in,D_out):
        super(My_Model_2, self).__init__()
        self.layer = nn.Sequential(nn.Linear(D_in,D_out))
    def forward(self,x):
        out = self.layer(x)
        return out

model_1 = My_Model_1(10,10)
model_2 = My_Model_2(10,10)
# How do I print the weights now?
# model_2.layer.0.weight doesn't work.

Many thanks.
Any help much appreciated.


Hi! Many thanks, this is what I was looking for. Was trying the wrong braces.

When I am doing this, the error i am getting is "The model has no attribute ‘layer’ ".

layer was defined in __init__:

self.layer = nn.Sequential(nn.Linear(D_in,D_out))

You have to use the variable name defined in your model.



Is there any way in Pytorch to get access to the layers of a model and weights in each layer without typing the layer name. Something like model.layers in keras which is discussed in the following:

1 Like

When I using this method with model.eval() I getting different values of weights for the same example.
Is that mean that my model don’t work correctly ?

Hi ptrblck

Happy to find you here.
I am building 2 CNN layers with 3 FC layers and using drop out two times.
My neural network is defined as follow: Do you see any thing wrong in that? I appreciate your feedback.

import torch
import torchvision
import torchvision.transforms as transforms
from import TensorDataset, DataLoader
import torch.optim as optim
import torch.nn as nn
from import random_split
from torch.nn import functional as F
import matplotlib.pyplot as plt
from torch.autograd import Variable

class ConvNetRedo1(nn.Module):
def init(self,numf1,numf2,fz1,fz2,nn2,nn3): # numf1( nnumber of filters first layer)numf2(nnumber of filters first layer)),fz1 kernel size(),fz2,nn2,nn3
super(ConvNetRedo1, self).init()
self.layer1 = nn.Sequential(nn.Conv3d(1, self.numf1, kernel_size=self.fz1, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))
self.layer2 = nn.Sequential(nn.Conv3d(self.numf1,self.numf2, kernel_size=self.fz2, stride=1, padding=2),nn.ReLU(),nn.MaxPool3d(kernel_size=2, stride=2))
self.fc1 = nn.Linear(3072, self.nn2) ##3027
self.fc2 = nn.Linear( self.nn2, self.nn3) # FULLY CONNECTED LAYERS
self.fc3 = nn.Linear( self.nn3, 1) # FULLY CONNECTED LAYERS
self.relu = nn.ReLU() # Non-Linear ReLU Layer: max(0,x)
self.sigmoid = nn.Sigmoid()
self.drop_out1 = nn.Dropout(0.5)
self.drop_out2 = nn.Dropout(0.5)
self.Relu=nn.LeakyReLU(0.1, inplace=True)

def forward(self, x):

    out = self.layer1(x)
    out = self.layer2(out)
    out = out.view(out.size(0), -1)
    out = self.fc1(out)

    out = self.fc2(out)
    out = self.fc3(out)
    out = self.sigmoid(out)
    return out

I’m not sure, if the number of features are correctly defined without knowing the input shape, but skimming through the model definition, I cannot find any obvious mistakes.
Are you seeing any issues with the model?

Many thanks for your reply.
The size of the features are good. I just want to know the way of designing, is dropout used in the good step? Is it better to use after Relu?

It doesn’t matter if dropout is applied before or after the relu. I cannot see any obvious mistakes.

Many thanks for your reply. Sorry I am transferring my model and data and labels to the GPU.I am not sure if I should transfer criterion and optimizer to the GPU or not?

I used them in this way

criterion = nn.BCELoss()

optimizer = torch.optim.Adam(model.parameters(), lr=.03)

I appreciate your help

Do u know any books or links which is usable?


You could transfer the criterion to the GPU just to avoid possible issues, but it shouldn’t be necessary for nn.BCELoss.
One minor advice: I would remove the last sigmoid in your model and use nn.BCEWithLogitsLoss instead, as it will be numerically more stable.

Check out “Deep Learning with PyTorch” by @lantiga, @elistevens, and @tom, which can be downloaded for free on the official website.
(It’s not the full book if I’m not mistaken, as it’s still work in progress :wink: )

Means finished the model to
out = self.fc3(out)
and use nn.BCEWithLogitsLoss . it has the in-built sigmoid in it?

def forward(self, x):

    out = self.layer1(x)
    out = self.layer2(out)
    out = out.view(out.size(0), -1)
    out = self.fc1(out)

    out = self.fc2(out)
    out = self.fc3(out)

    return out


criterion=nn.BCEWithLogitsLoss()    indeed the input of the this loss function should be the out put from linear layer ?

Yes, that’s the correct usage.

Dear ptrblck

I read now the book that you suggest, I am really confused in some cases during training I saw (optimizer.zero_grad()) is used before getting output and applying model and in this book ( lantiga) it is after getting the output?!
which one is correct?

t_p = model(t_un, *params)
loss = loss_fn(t_p, t_c)
optimizer.step( )

It depends on your coding style, but it should be called before loss.backward() in case you don’t want explicitly to accumulate the gradients (which is a valid use case, but not the usual work flow).
I personally try to add it right at the beginning of the training loop, since I think I remember it better, but I still forget it from time to time. :slight_smile:

I really appreciate your help :slight_smile:

you are really helping me

Dear ptrblck

I used the thing that we discussed but the outputs are more than 1 :slight_smile: !!