What is the most convenient way to make model weights dependent on some parameter?

universome · June 11, 2019, 4:28pm

Imagine I have a neural network f(x; w), where x is my input and w is its weights vector.
Weights vector w itself depends on some parameter u, i.e. w = g(u), where g(.) is some function. I want to do optimization for u. What is the most convenient way to do it?

To provide some example, let w = sin(u) / ||u|| (here ||u|| is the norm of u). And our model architecture f(x; w) is ResNet-18. How can we optimize for u in this case? As far as I understand, if I just make model to take additional parameter u during initialization, compute w and set layers parameters to w this will not work. The problem here is that each layer uses nn.Parameter() under the hood and nn.Parameter() ignores the history of computation (we have computed w = torch.sin(u) / torch.norm(u) and set layers parameters to w) and gradients for u will not be computed. Currently, I see the following two solutions:

Before each iteration, compute w from u and update each layer parameters. Then, when gradient with respect to w is computed, update u manually (if the gradient dw/du is not that difficult to compute manually).
Rewrite ResNet-18 completely from scratch in such a way, that it takes computed w as input and does not use nn.Parameter() and calls everything via torch.nn.functional. In this way the gradient with respect to u will be computed automatically (so, it’s like a tensorflow before 2.0 version).

Both of these solutions are quite tedious. Are there any better alternatives?

universome · June 11, 2019, 4:30pm

I went via the second way. For example, here is my code for Linear layer:

import torch.nn.functional as F

class LinearOp:
    def __init__(self, weights, bias=None):
        self.weights = weights
        self.bias = bias

    def __call__(self, X):
        return F.linear(X, self.weights, self.bias)

It’s not handy to write such a class for each layer (convolutional, batchnorm, etc)