All learnable parameters are not listed in model.parameters

Gkv · June 8, 2018, 5:54am

Hi all,
I created my model as follows. self.outv is a learnable parameter.

class model(nn.Module):
    def __init__(self):
        super(model,self).__init__()
        self.lstm1=nn.LSTM(300,1024,num_layers=1)
        self.batch_size=1
        self.tanh=nn.Tanh()
        self.softmax=nn.Softmax()
        self.fc1=nn.Linear(1024,512)
        self.drop=nn.Dropout(0.5)
        self.outv=nn.Parameter(torch.randn(512))
        if torch.cuda.is_available:
            self.lstm1=self.lstm1.cuda(gpu_device)
            self.fc1=self.fc1.cuda(gpu_device)
            self.outv=self.outv.cuda(gpu_device)
    def forward(self,features):
        emb1=self.get_embedding(features)
        out1=self.fc1(emb1)
        out1=self.drop(out1)
        out2=torch.dot(self.outv,out1)
        return out2
    def get_embedding(self,featues):
        hidden = (autograd.Variable(torch.randn(1, 1, 1024)).cuda(gpu_device),
          autograd.Variable(torch.randn((1, 1, 1024))).cuda(gpu_device)) #initialization of hidden state: creates two variables each contain a 1x1x3 tensor
        #passing data
        for w_embed in features:
            self.lstm1.flatten_parameters()
            out,hidden=self.lstm1(Variable(torch.from_numpy(np.array(w_embed)),volatile=True).view(1,1,300).cuda(gpu_device),hidden)
        return out
 
m=model()
m=m.cuda(gpu_device)
print(m.parameters)
#print output
<bound method Module.parameters of model(
  (lstm1): LSTM(300, 1024)
  (tanh): Tanh()
  (softmax): Softmax()
  (fc1): Linear(in_features=1024, out_features=512, bias=True)
  (drop): Dropout(p=0.5)
)>

I have few doubts.

Why self.outv is not listed in m.parameters. Is there any problem in my code?
How can I move the model to multiple gpus?

Thanks

crcrpar · June 8, 2018, 7:49am

Hi,

If you want to register parameter to module, self.register_parameter(name, parameter) would be preferable.
I’m sorry. I do not have any answer now.

ptrblck · June 8, 2018, 9:39am

If you just have Parameters in your __init__, you don’t have to handle cuda assignments yourself.
Just remove the if torch.cuda.is_available code and your parameters will be registered correctly.
Also, .parameters() is a method, so you should call it like:

print(list(m.parameters()))

or

print(list(m.named_parameters()))

Gkv · June 8, 2018, 1:47pm

@ptrblck Thanks a lot. The parameters part is clear now. So if I do loss.backward(), all the parameters listed will update right?. What are the situations that I have to handle cuda assignments myself? (can you please share some links which explain this).

How can I run this model in multiple GPUs, do the following code work?

if torch.cuda.is_available:
    m=m.cuda()
    m = nn.DataParallel(m, device_ids=None)

If so, what modification I need to do in the get_embedding function to work in multiple gpus? (I mean what i need to put in place of .cuda(gpu_device). )

Thanks

ptrblck · June 8, 2018, 2:17pm

You could get the device from your self.lstm1:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.lstm1 = nn.LSTM(300, 1024, num_layers=1)
        
    def forward(self, x):
        x = self.get_embedding(x)
        return x
        
    def get_embedding(self, x):
        device = 'cpu'
        if next(self.lstm1.parameters()).is_cuda:
            device = 'cuda:' + str(next(self.lstm1.parameters()).get_device())
        print('Using device: {}'.format(device))
        hidden = (torch.randn(1, 1, 1024).to(device),
                  torch.randn(1, 1, 1024).to(device))
        for w_embed in x:
            self.lstm1.flatten_parameters()
            out, hidden = self.lstm1(w_embed.detach().view(1, 1, 300).to(device), hidden)
        return out


model = MyModel()
model = model.to(0)
model = nn.DataParallel(model, device_ids=[0, 1])

x = torch.randn(2, 300)
output = model(x)
print(output)

Gkv · June 8, 2018, 4:41pm

@ptrblck Is the to(device) is analogous to .cuda(device)? When I ran your code it says,MyModel object has no attribute to. I am using pytorch 0.3.

Thanks

ptrblck · June 8, 2018, 4:46pm

It’s the same syntax, if you push your model to the GPU, but not if it should stay on the CPU.
If you are sure to use in on the GPU, you can just change it to .cuda(device).
Generally I would recommend updating to the current stable release (0.4.0).

Jaideep_Valani · May 11, 2019, 9:31am

hi ptrblck
I had one doubt regarding model.parameters
Assuming my model has got sequence which in turn is list of modules
0 Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=True)
(1): Relu()
(2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
1
Sequential(
(0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=True)
(1): Relu()
(2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

In this case if i do
for i in model[0].parameters():
print(i)

How many time loop would be running 3 times (weight and bias of conv ,Batch norm parameters) or 4 times or more

ptrblck · May 11, 2019, 4:42pm

If model[0] refers to the first nn.Sequential module, you’ll get 4 parameters:

conv.weight
conv.bias
bn.weight
bn.bias

Jaideep_Valani · May 12, 2019, 8:40am

Thanks so this would pull off only those which has got attr weights or bias ?
so We dont need to explicitly check using hasttr(m,‘weight’)?

secondly i had got one more q with regard to the weights ,what is diff between m.weights and m.weights.data ?

ptrblck · May 12, 2019, 1:43pm

No, model.parameters() will list all registered parameters. E.g. if you are using a custom module and assign a parameter as

self.my_param = nn.Parameter(torch.randn(1))

it will also be listed in model.parameters().

What is your use case that you would like to check, if the module has weight as a parameter?

weight.data refers to the underlying data tensor. If you manipulate it, Autograd won’t be able to track this operation, which might result in wrong calculations. Therefore I would recommend to avoid using the .data attribute.

Alpha · August 31, 2019, 10:12am

What will be registered in the model.parameters().
As far as now, what I know are as belows:

conv: weight, bias
bn: weight bias
nn.Parameter()

are there something else?
Thank you in advance.

111179 · March 14, 2021, 2:07pm

Hello, Mr.Ptrblck.
When I used print(list(m.named_parameters())) in my code, I still can not get the learnable parameters: threshold. Could you give me any suggestion? Thanks a lot.
Below is my code:

class Surrogate_BP_Function(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        out = torch.zeros_like(input).cuda()
        out[input > 0] = 1.0
        return out

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad = grad_input * 0.3 * F.threshold(1.0 - torch.abs(input), 0, 0)
        return grad

class SNN_VGG9_BNTT(nn.Module):
    def __init__(self, num_steps, leak_mem=0.95, img_size=32,  num_cls=10):
        super(SNN_VGG9_BNTT, self).__init__()

        self.img_size = img_size
        self.num_cls = num_cls
        self.num_steps = num_steps
        self.spike_fn = Surrogate_BP_Function.apply
        self.leak_mem = leak_mem
        self.batch_num = self.num_steps

        print (">>>>>>>>>>>>>>>>>>> VGG 9 >>>>>>>>>>>>>>>>>>>>>>")
        print ("***** time step per batchnorm".format(self.batch_num))
        print (">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>")

        affine_flag = True
        bias_flag = False

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=bias_flag)
        self.bntt1 = nn.ModuleList([nn.BatchNorm2d(64, eps=1e-4, momentum=0.1, affine=affine_flag) for i in range(self.batch_num)])
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1, bias=bias_flag)
        self.bntt2 = nn.ModuleList([nn.BatchNorm2d(64, eps=1e-4, momentum=0.1, affine=affine_flag) for i in range(self.batch_num)])
        self.pool1 = nn.AvgPool2d(kernel_size=2)

        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1, bias=bias_flag)
        self.bntt3 = nn.ModuleList([nn.BatchNorm2d(128, eps=1e-4, momentum=0.1, affine=affine_flag) for i in range(self.batch_num)])
        self.conv4 = nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1, bias=bias_flag)
        self.bntt4 = nn.ModuleList([nn.BatchNorm2d(128, eps=1e-4, momentum=0.1, affine=affine_flag) for i in range(self.batch_num)])
        self.pool2 = nn.AvgPool2d(kernel_size=2)

        self.conv5 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1, bias=bias_flag)
        self.bntt5 = nn.ModuleList([nn.BatchNorm2d(256, eps=1e-4, momentum=0.1, affine=affine_flag) for i in range(self.batch_num)])
        self.conv6 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=bias_flag)
        self.bntt6 = nn.ModuleList([nn.BatchNorm2d(256, eps=1e-4, momentum=0.1, affine=affine_flag) for i in range(self.batch_num)])
        self.conv7 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1, bias=bias_flag)
        self.bntt7 = nn.ModuleList([nn.BatchNorm2d(256, eps=1e-4, momentum=0.1, affine=affine_flag) for i in range(self.batch_num)])
        self.pool3 = nn.AvgPool2d(kernel_size=2)


        self.fc1 = nn.Linear((self.img_size//8)*(self.img_size//8)*256, 1024, bias=bias_flag)
        self.bntt_fc = nn.ModuleList([nn.BatchNorm1d(1024, eps=1e-4, momentum=0.1, affine=affine_flag) for i in range(self.batch_num)])
        self.fc2 = nn.Linear(1024, self.num_cls, bias=bias_flag)

        self.conv_list = [self.conv1, self.conv2, self.conv3, self.conv4, self.conv5, self.conv6, self.conv7]
        self.bntt_list = [self.bntt1, self.bntt2, self.bntt3, self.bntt4, self.bntt5, self.bntt6, self.bntt7, self.bntt_fc]
        self.pool_list = [False, self.pool1, False, self.pool2, False, False, self.pool3]

        # Turn off bias of BNTT
        for bn_list in self.bntt_list:
            for bn_temp in bn_list:
                bn_temp.bias = None


        # Initialize the firing thresholds of all the layers
        for m in self.modules():
            if (isinstance(m, nn.Conv2d)):
                m.threshold = 1.0
                torch.nn.init.xavier_uniform_(m.weight, gain=2)
            elif (isinstance(m, nn.Linear)):
                m.threshold = 1.0
                torch.nn.init.xavier_uniform_(m.weight, gain=2)

    def forward(self, inp):
        batch_size = inp.size(0)
        mem_conv1 = torch.zeros(batch_size, 64, self.img_size, self.img_size).cuda()
        mem_conv2 = torch.zeros(batch_size, 64, self.img_size, self.img_size).cuda()
        mem_conv3 = torch.zeros(batch_size, 128, self.img_size//2, self.img_size//2).cuda()
        mem_conv4 = torch.zeros(batch_size, 128, self.img_size//2, self.img_size//2).cuda()
        mem_conv5 = torch.zeros(batch_size, 256, self.img_size//4, self.img_size//4).cuda()
        mem_conv6 = torch.zeros(batch_size, 256, self.img_size//4, self.img_size//4).cuda()
        mem_conv7 = torch.zeros(batch_size, 256, self.img_size//4, self.img_size//4).cuda()
        mem_conv_list = [mem_conv1, mem_conv2, mem_conv3, mem_conv4, mem_conv5, mem_conv6, mem_conv7]

        mem_fc1 = torch.zeros(batch_size, 1024).cuda()
        mem_fc2 = torch.zeros(batch_size, self.num_cls).cuda()
        for t in range(self.num_steps):

            for i in range(len(self.conv_list)):
                mem_conv_list[i] = self.leak_mem * mem_conv_list[i] + self.bntt_list[i][t](self.conv_list[i](inp))
                mem_thr = (mem_conv_list[i] / self.conv_list[i].threshold) - 1.0
                out = self.spike_fn(mem_thr)
                rst = torch.zeros_like(mem_conv_list[i]).cuda()
                rst[mem_thr > 0] = self.conv_list[i].threshold
                mem_conv_list[i] = mem_conv_list[i] - rst
                out_prev = out.clone()
                if self.pool_list[i] is not False:
                    out = self.pool_list[i](out_prev)
                    out_prev = out.clone()
            out_prev = out_prev.reshape(batch_size, -1)
            mem_fc1 = self.leak_mem * mem_fc1 + self.bntt_fc[t](self.fc1(out_prev))
            mem_thr = (mem_fc1 / self.fc1.threshold) - 1.0
            out = self.spike_fn(mem_thr)
            rst = torch.zeros_like(mem_fc1).cuda()
            rst[mem_thr > 0] = self.fc1.threshold
            mem_fc1 = mem_fc1 - rst
            out_prev = out.clone()
            mem_fc2 = mem_fc2 + self.fc2(out_prev)
        out_voltage = mem_fc2 / self.num_steps
        return out_voltage

ptrblck · March 14, 2021, 10:40pm

threshold is used from the functional API via F.threshold, which is a function and is thus not trainable, and also assigned as an attribute to some modules as a scalar, which isn’t a trainable nn.Parameter either.
You would thus have to define it as an nn.Parameter, so that it’ll be returned by model.parameters().

111179 · March 15, 2021, 1:21am

Thank you so much for your kind reply. If threshold is not converted to nn.module, is there any other way to print threshold? Or have any other way to load the model’s state_dict?

Best regards.

ptrblck · March 15, 2021, 5:03am

If you want to register this tensor inside the module, so that it would also be added to the state_dict, but would remain constant, you could register it as a buffer via:

m.register_buffer('threshold', torch.tensor([1.]))

111179 · March 15, 2021, 5:31am

Thank you so much. I try it.