I have a pretty powerful model (inherited from nn.Module), that I want to reuse. The problem is that I want to replace some model Parameters with Tensors to make gradient flow through them. I need it because in my implementation these properties of the model will be outputs of another model. So I need a way to unregister Parameters with the corresponding names from the model. How I can do it.
I assume you don’t want to retrain these parameters?
If so, you could simply freeze them by setting their
.requires_grad attribute to
Alternatively, you could register them as buffers via
I whant to train variables, used for constraction of this Parameters. So gradient should flow through them. But call to nn.Parameter(…) breaks the computational graph. So I have to replace class property, defined as Parameter with class property with the same name, defined as Tensor.
Could you post a code snippet which shows your use case, please?
class VeryUsefullLegacyParent(nn.Module): def __init__(self,n): self.b = nn.Parameter(torch.zeros(n))
I want to create
class MyChild(VeryUsefullLegacyParent): def __init__(self, a: torch.Tensor): super(MyChild, self).__init__(len(a)) self.b = nn.functional.sigmoid(a)
My aim to get a gradient w.r.t
a (and deeper). But I can’t replace
b in such straighforward way, beacause it already registered as a model parameter. I want to use the same name for this variable, because I have a lot of functions working with
VeryUsefullLegacyParent that access this variable.
Does it mean you would like to get the gradient for
a and update it?
You could get the gradients for
a, if it’s a parameter.
Why do you want to replace
self.b, if it’s also a parameter, which is trainable?
- If I construct
b=nn.Parameter(a), I wouldn’t get the gradient w.r.t
a, because of call to
nn.Parameterbreaks the computational graph and
bwill be a leaf of the graph. (Maybe I’m mistaking, but my experiments showed that it works like this)
bshould be the function of a. So
bshouldn’t be updated via optimizer. But gradient should flow throw
b, because I will update
Yes, this behavior should be correct.
It seems you might be thinking about a static graph definition, was was used e.g. in Theano. In PyTorch you don’t define
bas a “method” on
a, but instead calculate it dynamically in the forward pass.