Torch model parameters by dictionary type

Hi! guys!

When I coded fully connected layer model into like below:
(The number of fully connected layer is dependent on “net_info” variable)

class Model(nn.Module):

def __init__(self,net_info):
       self.net_info = net_info
       self.w,self.b = {},{}
     for idx,hidden_size in enumerate(self.net_info):
         if idx == 0:
             self.w[idx] = nn.Parameter(torch.rand(1,hidden_size),requires_grad=True)
            self.b[idx] = nn.Parameter(torch.rand(hidden_size),requires_grad=True)
           self.w[idx] = nn.Parameter(torch.rand(net_info[idx-1],hidden_size),requires_grad=True)
          self.b[idx] = nn.Parameter(torch.rand(hidden_size),requires_grad=True)
  self.w[idx+1] = nn.Parameter(torch.rand(hidden_size,1),requires_grad=True)
  self.b[idx+1] = nn.Parameter(torch.rand(1),requires_grad=True)

def forward(self,x):

  for idx,hidden_size in enumerate(self.net_info):
      x = F.elu(torch.matmul(x,self.w[idx])+self.b[idx])
   x = torch.matmul(x,self.w[idx+1])+self.b[idx+1]

return x

The problem of above torch model is that does not create “model.parmeters()”

net_info = [50,60,70,50]
model = Model(net_info)
for p in model.parameters():

print p

Nothing happen, No trainable parameters! Do you have any idea?

Using a Python dict will not properly register the parameters so use nn.ParameterDict instead. :wink:

1 Like

Dear @ptrblck

Regarding this topic, kindly I have a follow up question.
Suppose I define the parameters using paramDic like below, then during the backpropagation, Is the error properly propagate from nth parameters toward 1th parameters in the dictionary? or the gradient of the error w.r.t each parameter is computed independently?
What I want to do is to define a tensor parameter for each layer, and during the loss.backward, the gradient of the error w.r.t. these parameters is propagated from the latest parameters toward the lower one.


class MyModule(nn.Module):
      def __init__(self):
        super(MyModule, self).__init__()
        self.params = nn.ParameterDict({
                '1': nn.Parameter(...),
                '2': nn.Parameter(...),
                 '3': nn.Parameter(...) ,
                 'n': nn.Parameters(...),
def forward(self, x):
           for paramKey in self.params.keys() 
                 x = self.params[paramKey] * x
        return x

The gradient calculation depends on the actual computation so if your computation graph uses the parameters sequentially (as seems to be the case in your example) the gradients will be calculated for each parameter and Autograd will calculate the gradients in the backward pass using the multiplication of the parameters.
Your example would thus work as if you were calling e.g. nn.Linear layers sequentially.
Each internal weight (and bias) parameter will get the corresponding gradient and Autograd will use the chain rule in the backward pass.