I have a pretty powerful model (inherited from nn.Module), that I want to reuse. The problem is that I want to replace some model Parameters with Tensors to make gradient flow through them. I need it because in my implementation these properties of the model will be outputs of another model. So I need a way to unregister Parameters with the corresponding names from the model. How I can do it.
I assume you don’t want to retrain these parameters?
If so, you could simply freeze them by setting their .requires_grad
attribute to False
.
Alternatively, you could register them as buffers via self.register_buffer(name, tensor)
.
I whant to train variables, used for constraction of this Parameters. So gradient should flow through them. But call to nn.Parameter(…) breaks the computational graph. So I have to replace class property, defined as Parameter with class property with the same name, defined as Tensor.
Could you post a code snippet which shows your use case, please?
I have
class VeryUsefullLegacyParent(nn.Module):
def __init__(self,n):
self.b = nn.Parameter(torch.zeros(n))
I want to create
class MyChild(VeryUsefullLegacyParent):
def __init__(self, a: torch.Tensor):
super(MyChild, self).__init__(len(a))
self.b = nn.functional.sigmoid(a)
My aim to get a gradient w.r.t a
(and deeper). But I can’t replace b
in such straighforward way, beacause it already registered as a model parameter. I want to use the same name for this variable, because I have a lot of functions working with VeryUsefullLegacyParent
that access this variable.
Does it mean you would like to get the gradient for a
and update it?
You could get the gradients for a
, if it’s a parameter.
Why do you want to replace self.b
, if it’s also a parameter, which is trainable?
- If I construct
b=nn.Parameter(a)
, I wouldn’t get the gradient w.r.ta
, because of call tonn.Parameter
breaks the computational graph andb
will be a leaf of the graph. (Maybe I’m mistaking, but my experiments showed that it works like this) - In
MyChild
classb
should be the function of a. Sob
shouldn’t be updated via optimizer. But gradient should flow throwb
, because I will updatea
-
Yes, this behavior should be correct.
-
It seems you might be thinking about a static graph definition, was was used e.g. in Theano. In PyTorch you don’t define
b
as a “method” ona
, but instead calculate it dynamically in the forward pass.