Exclude parameters from model

I have a pretty powerful model (inherited from nn.Module), that I want to reuse. The problem is that I want to replace some model Parameters with Tensors to make gradient flow through them. I need it because in my implementation these properties of the model will be outputs of another model. So I need a way to unregister Parameters with the corresponding names from the model. How I can do it.

I assume you don’t want to retrain these parameters?
If so, you could simply freeze them by setting their .requires_grad attribute to False.
Alternatively, you could register them as buffers via self.register_buffer(name, tensor).

I whant to train variables, used for constraction of this Parameters. So gradient should flow through them. But call to nn.Parameter(…) breaks the computational graph. So I have to replace class property, defined as Parameter with class property with the same name, defined as Tensor.

Could you post a code snippet which shows your use case, please?

I have

class VeryUsefullLegacyParent(nn.Module):
  def __init__(self,n):
    self.b = nn.Parameter(torch.zeros(n))

I want to create

class MyChild(VeryUsefullLegacyParent):
  def __init__(self, a: torch.Tensor):
    super(MyChild, self).__init__(len(a))
    self.b = nn.functional.sigmoid(a)

My aim to get a gradient w.r.t a (and deeper). But I can’t replace b in such straighforward way, beacause it already registered as a model parameter. I want to use the same name for this variable, because I have a lot of functions working with VeryUsefullLegacyParent that access this variable.

Does it mean you would like to get the gradient for a and update it?

You could get the gradients for a, if it’s a parameter.
Why do you want to replace self.b, if it’s also a parameter, which is trainable?

  1. If I construct b=nn.Parameter(a), I wouldn’t get the gradient w.r.t a, because of call to nn.Parameter breaks the computational graph and b will be a leaf of the graph. (Maybe I’m mistaking, but my experiments showed that it works like this)
  2. In MyChild class b should be the function of a. So b shouldn’t be updated via optimizer. But gradient should flow throw b, because I will update a
  1. Yes, this behavior should be correct.

  2. It seems you might be thinking about a static graph definition, was was used e.g. in Theano. In PyTorch you don’t define b as a “method” on a, but instead calculate it dynamically in the forward pass.