but when I update a parameter with setattr, this is already deleted from the state_dict, and I require to keep the value in the state_dict as well when I do setattr, so if I print
print("length of state_dict ", len(module.state_dict().keys()))
after the first operation this is 1 less, which I want this to be the same length after the second line, thanks a lot for your help
More context of what I am doing: I need to define all parameters of the model based on 1 parameter, I therefore define a parameter, then freeze all parameters and need to train only that one parameter, and compute and substitute all the rest of the parameters based on it.
Please note I cannot update paramerters like this as I said, I compute this parameter based on 1 shared parameter and that shared param is the only param
Hi @ptrblck
thanks for the response, the example you gave can only work if you define ‘w1’ as Parameter, lets see if this is not:
import torch
import torch.nn as nn
from collections import OrderedDict
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.w0 = nn.Parameter(torch.randn(1))
def forward(self, x):
return x + self.w0
model = MyModel()
print(model.state_dict())
OrderedDict([('w0', tensor([0.4342]))])
delattr(model, 'w0')
print(model.state_dict())
> OrderedDict()
setattr(model, 'w0', torch.randn(1)) #nn.Parameter(torch.randn(1)))
print(model.state_dict())
> OrderedDict()
Basically, here is what I need to do, I freeze each parameter of the network, then I need to project it to a random space, like
theta = theta + PX
then X is the shared parameter between all parameters of the network and the only one which gets updated and theta is the initial value for the parameter.
Only the nn.Parameter will show up in the state dict (or in the model.parameters()). This is expected.
And this is also what you want right? As you want it to be shared across multiple models.
If you want something to be a regular Tensor and saved in the state dict as well, you can use buffers via model.register_buffer("my_buffer", my_tensor).
Hi @ptrblck@albanD
Thank you for the response, I need to keep these other parameters in CPU, while state_dict with buffers push them to GPU when one do model.cuda(), I greatly appreciate assisting me how I can add them with buffers, while keeping the values into CPU, I am dealing with large-scale and this is crucial for me to keep them in CPU thanks