Access all weights of a model

Hi,

Is there a way to access all weights of a neural network model?

E.g. If we have the following class

import torch
import torch.nn as nn

class mlp_new(nn.Module):
def init(self, n_in, n_hid, n_out):
super(mlp_new, self).init()
self.layers = nn.Sequential(
nn.Linear(n_in, n_hid),
nn.ReLU(),
nn.Linear(n_hid, n_out)
)

def forward(self, x):
    return self.layers(x)

my_mlp = mlp_new(10, 5, 1)

Can I access all weights of my_mlp (e.g. my_mlp.layers.weight - not working)?

Actually I want to update all weights of the model using my own method with a single statement like optimizer.step().

Please note that, I know that weights can be accessed layer-wise ( my_mlp.layers[0].weight, my_mlp.layers[2].weight).

Thanks with best regards.

You could iterate the parameters to get all weight and bias params via:

for param in model.parameters():
    ....

# or
for name, param in model.named_parameters():
    ...

You cannot access all parameters with a single call.
Each parameter might have (and most likely has) a different shape, can be pushed to a different device etc.

14 Likes

Thanks in deed ptrblck.

With best regards.

TGI Fernando

Hi,I have a question about this.Here is the code for my training phase.
I want to get the weight parameters of the first level convolution

for epoch in range(EPOCH): 
    NET.train()
    torch.manual_seed(3)
    print('Epoch:', epoch + 1, 'Training...') 
    for i, (batch_x, batch_y) in enumerate(loader): 
        batch_x=Variable(batch_x)
        output = NET(batch_x)
        loss = lossfunc(output, batch_y)  
        loss.backward()
        optimizer.step()
        wc_loss.append(loss.data)   
        print('Epoch: ', epoch, '| Step: ', i, '| loss: ',wc_loss[i])        
        for name, param in NET.named_parameters():
            if 'network.0.conv1.weight_v' in name:
                parm[name]=param.detach().numpy()  
                b.append(parm['network.0.conv1.weight_v'])  

but the result of 【b】is always repeated,same result every updata,like this.(note:The first convolution layer is conv1d,
conv1d(1,3,kernel=3)

[array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
......
......

However, the value of 【loss】 decreased normally.I don’t know why the value of the weight 【b】 is not updated.

I cannot reproduce the constant output using this small code snippet:

model = models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=1.)
criterion = nn.CrossEntropyLoss()

data = torch.randn(1, 3, 224, 224)
target = torch.randint(0, 1000, (1,))

for _ in range(3):
    out = model(data)
    loss = criterion(out, target)
    loss.backward()
    optimizer.step()
    print(model.fc.weight.detach().numpy())

Try to .clone() the parameter before storing them in the list and let me know, if that helps.
If not, your computation graph might be detached at one point and the parameter might not get any gradients.

Thank you for your reply!
um…
I don’t know how to use clone properly (this time I’m not going to delve into this one), so my choice is: clear all variables, restart the program, and restart training. This time I set 【epoch = 1】, manual training. I didn’t change the original code, I just ran it again. I get the change of the weight parameter value in each epoch.

Note: for each epoch, the parameter is updated 1180 times.
I only select a certain weight parameter(I call it weight B) in the model and observe the change of its value in the process of updating.
After the end of each time model training, I will draw the change of weight into a graph. Then, without any changes, retrain.

图片1
图片2
图片3
图片4
图片5
图片6

The model was trained 12 times (manual training), and the above 6 images were obtained. Each graph shows the update of weight B.
It can be seen that in the first five training, the value of weight B has been changing. But in the sixth training, the weight B did not change. From the 6th to the 12th training, the weight B still did not change and remained at -0.5233551.
The following is the 【loss curve】 from the 7th to the 12th. (in order to reduce the space, I merged these 6 images)

图片7

Does this mean that since the sixth training, the whole model has not learned anything new from the data, and the model has reached its best state? If so, why are the losses still changing?

If the 6 images correspond to epochs, then this particular parameter might not get (large) gradients anymore and is thus constant.
If the loss is still decreasing, other parts of the model seem to be trained further.

1 Like

thanks,I think maybe you are right. I can’t find any other reasonable explanation.

Hi,
Is there any way to simply convert all wights of the PyTorch’s model into a single vector? (the model has conv, pool, and … each of which has their own weights)
(For sure the dimension of a resulted vector will be 1 * n in which the n represents all number of weights in PyTorch’s model).

You could flatten each parameter, append it in a list, and create a tensor:

model = models.resnet18()

params = []
for param in model.parameters():
    params.append(param.view(-1))
params = torch.cat(params)
print(params.shape)
> torch.Size([11689512])
2 Likes

AttributeError: ‘Tensor’ object has no attribute ‘append’

Based on the error I guess you might have redefined params as a tensor in your code.
Could you check the type of params and make sure it stays a list until you use torch.cat to create the tensor?

It works. Thank you!!

Is there any simple way to rearrange it again to the initial shape?(after we get params = torch.cat(params))

I am using “params[0].view(param.data.shape)” and … to do this.

You would have to store the original shapes with the parameters in order to reshape them, as this information is lost if you flatten all parameters to a single flat tensor.

Hi, could share the way that how to save weight and bias into graph?
Because I want to check every weight cuz i want to freeze some layers and want to check it is done well
Thanks you.

graph? you mean picture or computation Graph?
I only know the general way to view it:
View the weight and bias parameters for all networks:

model.state_dict()

Save model parameters:
torch.save(model.state_dict(), '\model.pth')

1 Like

Hello @ptrblck , thank you for the answer. I have been trying to use your answer and try to implement something like weight clustering for some model optimization experiments for my project. I have posed a question for the same at Weight clustering question.As this is the most relevant answer that I have found I was wondering if you could confirm if the way that I am trying actually makes sense or is there a better way or an alternate way to achieve the same. It would be great if you could help me out, please.