 What is the most convenient way to make model weights dependent on some parameter?

#1

Imagine I have a neural network `f(x; w)`, where `x` is my input and `w` is its weights vector.
Weights vector `w` itself depends on some parameter `u`, i.e. `w = g(u)`, where `g(.)` is some function. I want to do optimization for `u`. What is the most convenient way to do it?

To provide some example, let `w = sin(u) / ||u||` (here `||u||` is the norm of `u`). And our model architecture `f(x; w)` is ResNet-18. How can we optimize for `u` in this case? As far as I understand, if I just make model to take additional parameter `u` during initialization, compute `w` and set layers parameters to `w` this will not work. The problem here is that each layer uses `nn.Parameter()` under the hood and `nn.Parameter()` ignores the history of computation (we have computed `w = torch.sin(u) / torch.norm(u)` and set layers parameters to `w`) and gradients for `u` will not be computed. Currently, I see the following two solutions:

• Before each iteration, compute `w` from `u` and update each layer parameters. Then, when gradient with respect to `w` is computed, update `u` manually (if the gradient `dw/du` is not that difficult to compute manually).
• Rewrite ResNet-18 completely from scratch in such a way, that it takes computed `w` as input and does not use nn.Parameter() and calls everything via torch.nn.functional. In this way the gradient with respect to `u` will be computed automatically (so, it’s like a tensorflow before 2.0 version).

Both of these solutions are quite tedious. Are there any better alternatives?

#2

I went via the second way. For example, here is my code for Linear layer:

``````import torch.nn.functional as F

class LinearOp:
def __init__(self, weights, bias=None):
self.weights = weights
self.bias = bias

def __call__(self, X):
return F.linear(X, self.weights, self.bias)
``````

It’s not handy to write such a class for each layer (convolutional, batchnorm, etc)