Updating the parameters in torch, delattr, setattr do not work as expected

Dara_Vahidi · April 20, 2021, 11:25pm

Hi
I need to modify the parameters of a model, I am doing this

  delattr(base, localname)
  setattr(base, localname, param)

but when I update a parameter with setattr, this is already deleted from the state_dict, and I require to keep the value in the state_dict as well when I do setattr, so if I print

print("length of state_dict ", len(module.state_dict().keys()))

after the first operation this is 1 less, which I want this to be the same length after the second line, thanks a lot for your help

More context of what I am doing: I need to define all parameters of the model based on 1 parameter, I therefore define a parameter, then freeze all parameters and need to train only that one parameter, and compute and substitute all the rest of the parameters based on it.

@ptrblck you are my hope.

Dara_Vahidi · April 20, 2021, 11:34pm

Please note I cannot update paramerters like this as I said, I compute this parameter based on 1 shared parameter and that shared param is the only param

model.weight = nn.Parameter(torch.cat((model.weight, torch.randn(10, 1)), 1))

ptrblck · April 21, 2021, 5:50am

I’m not completely sure why are are unable to use the last code snippet, but also the first one seems to work:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.w0 = nn.Parameter(torch.randn(1))
        
    def forward(self, x):
        return x + self.w0

model = MyModel()
print(model.state_dict())
OrderedDict([('w0', tensor([0.4342]))])

delattr(model, 'w0')
print(model.state_dict())
> OrderedDict()

setattr(model, 'w1', nn.Parameter(torch.randn(1)))
print(model.state_dict())
> OrderedDict([('w1', tensor([0.9621]))])

Dara_Vahidi · April 21, 2021, 8:10am

Hi @ptrblck
thanks for the response, the example you gave can only work if you define ‘w1’ as Parameter, lets see if this is not:

import torch
import torch.nn as nn
from collections import OrderedDict

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.w0 = nn.Parameter(torch.randn(1))

    def forward(self, x):
        return x + self.w0

model = MyModel()
print(model.state_dict())
OrderedDict([('w0', tensor([0.4342]))])

delattr(model, 'w0')
print(model.state_dict())
> OrderedDict()

setattr(model, 'w0', torch.randn(1)) #nn.Parameter(torch.randn(1)))
print(model.state_dict())
> OrderedDict()

Basically, here is what I need to do, I freeze each parameter of the network, then I need to project it to a random space, like
theta = theta + PX

then X is the shared parameter between all parameters of the network and the only one which gets updated and theta is the initial value for the parameter.

albanD · April 21, 2021, 2:13pm

Only the nn.Parameter will show up in the state dict (or in the model.parameters()). This is expected.
And this is also what you want right? As you want it to be shared across multiple models.

If you want something to be a regular Tensor and saved in the state dict as well, you can use buffers via model.register_buffer("my_buffer", my_tensor).

Dara_Vahidi · April 21, 2021, 3:02pm

Hi @ptrblck @albanD
Thank you for the response, I need to keep these other parameters in CPU, while state_dict with buffers push them to GPU when one do model.cuda(), I greatly appreciate assisting me how I can add them with buffers, while keeping the values into CPU, I am dealing with large-scale and this is crucial for me to keep them in CPU thanks