Access all weights of a model

TGI_Fernando · April 21, 2020, 8:58am

Hi,

Is there a way to access all weights of a neural network model?

E.g. If we have the following class

import torch
import torch.nn as nn

class mlp_new(nn.Module):
def init(self, n_in, n_hid, n_out):
super(mlp_new, self).init()
self.layers = nn.Sequential(
nn.Linear(n_in, n_hid),
nn.ReLU(),
nn.Linear(n_hid, n_out)
)

def forward(self, x):
    return self.layers(x)

my_mlp = mlp_new(10, 5, 1)

Can I access all weights of my_mlp (e.g. my_mlp.layers.weight - not working)?

Actually I want to update all weights of the model using my own method with a single statement like optimizer.step().

Please note that, I know that weights can be accessed layer-wise ( my_mlp.layers[0].weight, my_mlp.layers[2].weight).

Thanks with best regards.

ptrblck · April 22, 2020, 2:16am

You could iterate the parameters to get all weight and bias params via:

for param in model.parameters():
    ....

# or
for name, param in model.named_parameters():
    ...

You cannot access all parameters with a single call.
Each parameter might have (and most likely has) a different shape, can be pushed to a different device etc.

TGI_Fernando · April 22, 2020, 7:03am

Thanks in deed ptrblck.

With best regards.

TGI Fernando

hpf · September 5, 2020, 4:16am

Hi，I have a question about this.Here is the code for my training phase.
I want to get the weight parameters of the first level convolution，

for epoch in range(EPOCH): 
    NET.train()
    torch.manual_seed(3)
    print('Epoch:', epoch + 1, 'Training...') 
    for i, (batch_x, batch_y) in enumerate(loader): 
        batch_x=Variable(batch_x)
        output = NET(batch_x)
        loss = lossfunc(output, batch_y)  
        loss.backward()
        optimizer.step()
        wc_loss.append(loss.data)   
        print('Epoch: ', epoch, '| Step: ', i, '| loss: ',wc_loss[i])        
        for name, param in NET.named_parameters():
            if 'network.0.conv1.weight_v' in name:
                parm[name]=param.detach().numpy()  
                b.append(parm['network.0.conv1.weight_v'])

but the result of 【b】is always repeated,same result every updata，like this.（note：The first convolution layer is conv1d,
conv1d(1,3,kernel=3)）

[array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
 array([[[-11.6726265 ,  12.326811  ,  12.306071  ]],
 
        [[  0.4224267 ,  -0.21204475,  -0.07533604]],
 
        [[ 11.559458  ,  11.859747  ,  12.272797  ]]], dtype=float32),
......
......

However, the value of 【loss】 decreased normally.I don’t know why the value of the weight 【b】 is not updated.

ptrblck · September 5, 2020, 5:01am

I cannot reproduce the constant output using this small code snippet:

model = models.resnet18()
optimizer = torch.optim.SGD(model.parameters(), lr=1.)
criterion = nn.CrossEntropyLoss()

data = torch.randn(1, 3, 224, 224)
target = torch.randint(0, 1000, (1,))

for _ in range(3):
    out = model(data)
    loss = criterion(out, target)
    loss.backward()
    optimizer.step()
    print(model.fc.weight.detach().numpy())

Try to .clone() the parameter before storing them in the list and let me know, if that helps.
If not, your computation graph might be detached at one point and the parameter might not get any gradients.

hpf · September 5, 2020, 2:29pm

Thank you for your reply！
um…
I don’t know how to use clone properly (this time I’m not going to delve into this one), so my choice is: clear all variables, restart the program, and restart training. This time I set 【epoch = 1】, manual training. I didn’t change the original code, I just ran it again. I get the change of the weight parameter value in each epoch.

Note: for each epoch, the parameter is updated 1180 times.
I only select a certain weight parameter(I call it weight B) in the model and observe the change of its value in the process of updating.
After the end of each time model training, I will draw the change of weight into a graph. Then, without any changes, retrain.

The model was trained 12 times (manual training), and the above 6 images were obtained. Each graph shows the update of weight B.
It can be seen that in the first five training, the value of weight B has been changing. But in the sixth training, the weight B did not change. From the 6th to the 12th training, the weight B still did not change and remained at -0.5233551.
The following is the 【loss curve】 from the 7th to the 12th. (in order to reduce the space, I merged these 6 images)

Does this mean that since the sixth training, the whole model has not learned anything new from the data, and the model has reached its best state? If so, why are the losses still changing?

ptrblck · September 5, 2020, 6:44pm

If the 6 images correspond to epochs, then this particular parameter might not get (large) gradients anymore and is thus constant.
If the loss is still decreasing, other parts of the model seem to be trained further.

hpf · September 6, 2020, 12:51am

thanks，I think maybe you are right. I can’t find any other reasonable explanation.

Ohm · March 5, 2021, 2:28am

Hi,
Is there any way to simply convert all wights of the PyTorch’s model into a single vector? (the model has conv, pool, and … each of which has their own weights)
(For sure the dimension of a resulted vector will be 1 * n in which the n represents all number of weights in PyTorch’s model).

ptrblck · March 5, 2021, 5:45am

You could flatten each parameter, append it in a list, and create a tensor:

model = models.resnet18()

params = []
for param in model.parameters():
    params.append(param.view(-1))
params = torch.cat(params)
print(params.shape)
> torch.Size([11689512])

Ohm · March 5, 2021, 3:03pm

AttributeError: ‘Tensor’ object has no attribute ‘append’

ptrblck · March 6, 2021, 5:32am

Based on the error I guess you might have redefined params as a tensor in your code.
Could you check the type of params and make sure it stays a list until you use torch.cat to create the tensor?

Ohm · March 6, 2021, 5:19pm

It works. Thank you!!

Ohm · March 7, 2021, 3:51pm

Is there any simple way to rearrange it again to the initial shape?(after we get params = torch.cat(params))

I am using “params[0].view(param.data.shape)” and … to do this.

ptrblck · March 8, 2021, 6:10am

You would have to store the original shapes with the parameters in order to reshape them, as this information is lost if you flatten all parameters to a single flat tensor.

SungmanHong · June 1, 2021, 5:52am

Hi, could share the way that how to save weight and bias into graph?
Because I want to check every weight cuz i want to freeze some layers and want to check it is done well
Thanks you.

hpf · June 1, 2021, 9:35am

graph? you mean picture or computation Graph?
I only know the general way to view it:
View the weight and bias parameters for all networks:

model.state_dict()

Save model parameters:
torch.save(model.state_dict(), '\model.pth')

Sourabh · January 22, 2022, 5:31pm

Hello @ptrblck , thank you for the answer. I have been trying to use your answer and try to implement something like weight clustering for some model optimization experiments for my project. I have posed a question for the same at Weight clustering question.As this is the most relevant answer that I have found I was wondering if you could confirm if the way that I am trying actually makes sense or is there a better way or an alternate way to achieve the same. It would be great if you could help me out, please.

nivedita_shrivastava · June 6, 2022, 8:13am

Hi,
I wish to store all the model parameters and biases in a NumPy array. I have used the following approach.

for param_tensor in model.state_dict():
double_x = model.state_dict()[param_tensor].numpy()

Can someone verify if this is the correct way? Additionally, what is the datatype of the single value of the bias?

ptrblck · June 6, 2022, 5:17pm

The code looks generally correct, but note that the dtype would be the one you’ve used in your PyTorch model, so float32 by default unless you’ve explicitly used another format.